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CHAPTER 1. INTRODUCTION 


The Amdahl Key Computer Laboratories K-1 family of supercomputers is designed to 
meet the needs of the modern large computer user. Equipped with an extremely large 
address space, very high I/O and memory bandwidths, and pipelined functional units, the K-1 
is ideally suited to tackling today’s highly compute intensive problems. At the same time, a 
simple, straightforward, yet powerful instruction set makes the K-1 architecture one of the 
best possible targets for optimizing compilers. A complete set of virtual memory features 
rounds out the architecture and makes it possible to run modern operating systems smoothly 
and efficiently. 


This manual describes the architecture of the central processing unit of the K-1l 
family. Different members of this family may contain different numbers of central processing 
units or have different physical memory sizes or attached I/O processors, but the central 
processing units all function identically. The main body of this manual contains all the 
information necessary for programming the K-1, with the exception of instruction timing 
information which is provided in Appendix C. 


The following sections describe the main features of the K-1 architecture and are 
followed by individual instruction descriptions. Instruction indexes are provided in 
Appendices A and B for quick reference. Appendix C gives detailed instruction timing 
information and Appendix D gives trap handling details along with an example of a trap 
handler. Appendices E, F and G give memory system, I/O system, and Front-End system 
specifics, respectively. Implementation dependent aspects of the K-1 architecture are noted 
in the text with square brackets ([]) and are references to Appendix H, which explains 
features particular to this implementation. 


The examples used in this manual to describe K-1 machine instructions conform to 
the K-1 assembly language syntax, which is described in a separate manual, The K-I1 
Assembly Language Reference Manual. 


Other documents of interest are: 
KC-126: K-1/IOP Software Interface Specification 


KC-109: The K-1 Assembly Language Reference Manual 
ANSI/NIEEE Std 754-1985: IEEE Standard for Binary Floating-Point Arithmetic 
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CHAPTER 2. K-1 ARCHITECTURE OVERVIEW 


2.1 Introduction 


The K-1 architecture is designed for general purpose/scientific computing. It has been 
optimized to allow extremely high performance implementations such as the Amdahl Key 
Computer Laboratories K-1 system. It provides an extensive set of high-precision, [EEE- 
compatible, floating-point instructions, as well as a full complement of integer, logical and ad- 
dressing operations. In addition, there are instructions to manipulate the virtual memory 
system, to control the caching of main memory data, and to control input/output. 


The K-1 computer can be divided into three main parts: the Central Processing Units 
(CPUs, or simply “processors”), the memory subsystem, and the I/O subsystem. Each 
CPU can be further divided into three main subsections: the instruction fetch and issue units, 
the register file, and the functional units. Figure 2-1 shows the major interconnections with- 
in a K-1 CPU, and its connections to the memory and I/O subsystems. 


The most central part of a K-1 CPU is the register file, containing up to 64 general- 
purpose registers of 64 bits each. The register file is used to store data items of all types. 
Functional units take their inputs from registers, or from constants that are part of the in- 
struction. Functional unit results are always stored into registers. Most instructions can 
specify three independent register addresses. For example, the add instruction adds two 
registers together and stores the result into a third register. 


There are five different types of functional units which process information from the 
registers: the integer, load/store, floating-point add, floating-point multiply, and floating- 
point divide/square root units. With the exception of a few special instructions that affect 
the internal state of the CPU, each instruction in the architecture is executed by exactly one 
functional unit. Timings for the individual functional units are given in Appendix C. The num- 
ber of each type of functional unit present in a given K-1 CPU is model-dependent; however, 
every K-1 CPU has at least one unit of each type. A program will give identical results re- 
gardless of the number of units; only the execution time will be affected. 


The K-1 main memory system consists of a very large, uniformly addressed memory 
space. Two large caches insulate each K-1 processor from the access time of main memory. 
One cache is used exclusively to hold instructions and the other to hold data. The caches op- 
erate transparently, but there is no hardware coherence between the instruction and data 
caches. The hardware does, however, maintain cache coherence among the data caches of all 
CPUs in a multiprocessor system. The architecture provides instructions for manipulating 
the caches and for updating memory. 
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Figure 2-1. The K-1 Processor and Its Interconnections 
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I/O in the K-1 architecture is performed through attached I/O processors that inter- 
face via a number of very high speed I/O busses. These busses are operated by I/O control- 
lers within the K-1, which transfer data directly to and from main memory. Instructions are 
provided to send control information to, and receive status from, the I/O subsystem. 


2.2 Conventions 


Every numeric data type in this manual is assumed to have its most significant bit on 
the left and its least significant bit on the right. Bits are numbered in ascending order from 
least significant to most significant, right-to-left, starting with zero. Bytes are numbered in 
the reverse order from most significant to least significant, left-to-right. This numbering 
scheme is commonly referred to as “big endian”. The only exception to this is related to the 
Byte Order Low-to-High feature which allows the ordering of bytes in memory to be in as- 
cending order from least significant to most significant, right-to-left (commonly referred to as 
“little endian”). However, even in this case, the bit numbering remains right-to-left. 


A data type can be zero-extended from its natural size to a larger size by appending 
sufficient high-order zero bits to make up the difference. For example, a byte can be zero- 
extended to 64 bits by appending 56 high-order zero bits. Similarly, a signed quantity can be 
sign-extended by appending sufficient high-order copies of the sign bit. For example, a 32- 
bit data type can be sign-extended to 64 bits by appending 32 high-order copies of bit 31. 


The letters K and M indicate 1,024 and 1,048,576 units of something, usually bytes. 


The notation m..n indicates a contiguous range of bits within a word or field. Bit num- 
ber m is the most significant end of the range, and bit number zn is the least significant end. If 
the name of the field is NAME, then such a range is indicated by NAME<m..n>. 


2.3 Data Types 


The K-1 architecture supports a number of data types and precisions. When a data 
item is in a register, it may be a signed or unsigned integer of length 8, 16, 32, or 64 bits, or a 
32-bit or 64-bit IEEE floating-point number. When in memory, only the precision of the data 
item (8, 16, 32, or 64 bits) is important. Quantities smaller than 64 bits are always right-ad- 
justed within a register. 


2.3.1 Integers 


Integers are 8-, 16-, 32-, 33-, 53-, or 64-bit quantities and may be either signed or 
unsigned. Signed integers are two’s complement values whose most significant bit is the 
sign bit, S. Figure 2-2 shows the format of a signed 64-bit integer. The 53-bit integer for- 
mat is used only for the integer multiply instructions; the 33-bit integer format is used only 
for the integer divide instructions. 
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Integer Data 


(63) 


Figure 2-2. Signed 64-Bit Integer Data Type 


2.3.2 Floating-Point Numbers 


Two precisions of floating-point numbers are supported: 32-bit (single) and 64-bit 
(double). Separate instructions are provided for each floating-point operation for each preci- 
sion. The formats conform to the ANSI/IEEE Standard 754-1985 floating-point single and 
double formats. 


Both floating-point precisions contain a sign bit, an exponent field, and a fraction 
field. The value represented is always a sign-magnitude mantissa times a power of two de- 
termined by the exponent. The mantissa consists of a “hidden” bit followed by the fraction. 
The floating-point format also provides certain special values, such as NaN (Not-a-Num- 
ber), infinity, negative zero and denormalized numbers (allowing for gradual underflow). 


The IEEE standard specifies two types of NaNs: signaling and quiet. Both types of 
NaNs must have a maximum exponent and a non-zero fraction. The K-1 recognizes a NaN 
as quiet if the most significant fraction bit is a one, and as signaling if it is zero. When given 
a NaN as input, all non-trapping floating-point operations (except compares and negate) will 
generate a quiet NaN. When given as operands to floating-point instructions, signaling 
NaNs cause an invalid operation trap if enabled. Whenever a NaN is output by a floating- 
point operation (except negate), it will be a quiet NaN in the form shown in Tables 2-1 and 
2-2 with the sign bit set to zero. The floating-point negate function is considered to be a da- 
ta moving operation, and thus never changes its input (except for the sign bit), and never 
Causes a trap. 


Most floating-point numbers are normalized, meaning that their most significant man- 
tissa bit (called the hidden bit) is a one. This bit, therefore, does not need to be present in 
the representation and is omitted. When computing the value represented by a floating- 
point number, the hidden bit must be reinserted. However, in order to extend the negative ex- 
ponent range and allow for gradual underflow, the floating-point format provides a class of 
numbers called denormalized numbers. These numbers, which are very close to zero in value, 
have a hidden bit of zero (and are therefore not normalized). 

32-bit (single) precision floating-point format numbers have a sign bit S, an 8-bit ex- 
ponent EXP, and a 23-bit fraction F (Figure 2-3). The value, V, represented by this format 
is computed as in Table 2-1. Note that Figure 2-3 shows this format as it would be stored 
in a 64-bit register, 32-bit precision floating-point numbers can be stored in 32 bits in memo- 
ry. 
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Unused 


(32) 


Figure 2-3. IEEE 32-Bit Precision Floating-Point Format 


Table 2-1. IEEE 32-Bit Precision Floating-Point Values 


NaN (signaling) 
NaN (quiet) 
(-1)° INFINITY 
(18 2EXP-127 (1p) 
(-1)8 27126 (o,f) 
(-1)5 0 


64-bit (double) precision floating-point format numbers have a sign bit S, an 11-bit i 
exponent EXP, and a 52-bit fraction F (Figure 2-4). The value, V, represented by this for- 
mat is computed as in Table 2-2. 


Figure 2-4. [EEE 64-Bit Precision Floating-Point Format 


7 


+ 


* y..y is any bit pattern on input, and is all ones if generated by the K-1. 


X..X is any non-zero bit pattern on input. Signaling NaNs are never generated by the K-1. 
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Table 2-2. IEEE 64-Bit Precision Floating-Point Values 


NaN (signaling) 
NaN (quiet) 


(-1)8 INFINITY 
0 <EXP < 2047 (-1)8 2EXP-1023 (7 Fy 
0 (-1)8 271022 (Fy 
0 (-1)5 0 


Note that due to the sign-magnitude nature of the floating-point formats, there are 
distinct representations for both positive and negative zero. 


Refer to Appendix I for details on floating-point computations and exceptions in the 
K-1. 


2.4 Instruction Formats 


The K-1 uses a three-address instruction format: most instructions require three reg- 
ister addresses, two of which specify source operands and the third of which specifies the 
destination for the result. The way in which the K-1 specifies the use of immediate con- 
stants as operands, however, is quite different from other machines. 


Whereas most machines use different opcodes to distinguish instructions which allow 
an immediate constant operand from instructions which only have register operands, the K-1 
distinguishes these types of instructions with a format code. The format code is a separate 
field in the instruction from the opcode; it regularizes the instruction set by separating the 
function of an instruction (e.g., addition, loading from memory, etc.), from the sources of its 
operands (constants or registers). For example, the add opcode with one format code will 
add two registers; with another format code it will add a register and an immediate 9-bit con- 
stant; and with another format code it will add a register and an immediate 36-bit constant. 


For instruction formats with immediate constants, the format code also controls the 
order of the constant operand and the register operand. That is, one format code will perform 
constant op register, and another will perform register op constant. This provides more flexi- 
bility and allows the K-1 to have fewer opcodes than would otherwise be needed. (Only one 
form of asymmetric instructions such as subtract and magnitude comparison is required). 


bb 


X..X 1$ any non-zero bit pattern on input. Signaling NaNs are never generated by the K-1. 


++ 


y.-y is any bit pattern on input, and is all ones if generated by the K-1. 
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Instruction formats are fully explained in Chapter 3. 


2.5 General Registers 


The register file comprises up to 64 registers containing 64 bits each, and are referred 
to as r0 to r63. Implementations of the K-1 architecture may support less than 64 registers 
[2-1]. All of the registers are general purpose and may be used to hold the operands or the 
results of any operation. The architecture treats all registers identically. 


2.6 Flags and Conditional Branching 


The Processor Status register contains seven flags, named f0 through f6, that can 
each store a binary value; an additional flag, f7, always contains the value one. These flags 
control conditional branching, and allow conditional execution of instructions in most formats. 
(See Chapter 3 for a description of instruction formats). A number of different instructions, 
such as the compare instructions, may set or clear a flag; a field in the instruction determines 
which flag is affected. In addition, boolean operations may be performed on flags and the re- 
sult may be written to a flag or to a register. During a conditional branch instruction, any flag 
may determine if the branch should be taken. The flag to be used and its polarity are speci- 
fied by fields in the branch instruction. There are actually no unconditional branch instruc- 
tions in the K-1 architecture. “Unconditional” branches are accomplished by specifying flag 
f7 as the branch condition. Similarly, in most instruction formats, unconditional execution can 
only be achieved by conditional execution with respect to f7. As will be seen in Chapter 3, 
conditional branching and conditional execution of instructions are specified in exactly the 
same fashion. 


2.7 ELF Flags 


Associated with each general register is an Early Load Fault (ELF) flag [2-2]. 
These flags are set and cleared by load and eload instructions, and may be interrogated by 
an echk instruction. The ELF flags are used to indicate that an eload instruction encoun- 
tered an illegal condition. (See the section on Early Load below for a description of the 
eload instruction and its differences from normal load instructions). The relf and welf in- 
structions can be used to save and restore the ELF flags. 


2.8 Memory 


Though most operations in a K-1 program will be of the register-to-register type, 
there must still be some way to move data in registers into and out of memory. The K-1 
load/store instructions serve this purpose. These instructions operate on a number of differ- 
ent sizes of data in memory, and are blind to the data types being moved. For example, a 64- 
bit integer and a 64-bit floating-point number are treated identically by the load/store instruc- 
tions. 
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Main memory can be thought of as a very large array of 8-bit bytes which are num- 
bered starting from zero. The architectural limit to the size of memory is the number of bytes 
that can be addressed in 48 bits, approximately 281 trillion bytes. Implementations of the 
K-1 architecture may support smaller address spaces [2-3]. 


2.8.1 Precision and Alignment 


When a load/store instruction references memory, it specifies the precision in which 
the operation is to be done: either 1, 2, 4, or 8 bytes. The address given must be aligned ac- 
cording to the precision. This means that the address for a 2-byte operation must be evenly 
divisible by 2; for a 4-byte operation, it must be evenly divisible by 4; and for an 8-byte oper- 
ation, it must be evenly divisible by 8. A load/store instruction with an address that is not 
properly aligned will cause a trap (except for eload instructions, as explained in the section 
on Early Load, below). 


The address given in a load/store instruction normally specifies the most significant 
byte of the data item. However, there is a bit in the Processor Status register called Byte 
Order Low-to-High, which, if set, causes the address to refer to the least significant byte of 
the data item. In either case, the address of a data item of any precision always refers to 
byte O of the data item. (Byte Order Low-to-High is described in more detail in the section 
on the Processor Status register.) 


Index {1,2,4,8} 


Base ——> 


Address 


Figure 2-5. Address Calculation for Load/Store Instructions 


2.8.2 Addressing Modes 


Addresses in load/store instructions are computed in one of two ways. The address 
may come directly from a single register or constant operand, or it may be calculated as the 
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sum of two terms: the base and the index. The index is multiplied by an additional factor, the 
memory precision (Figure 2-5). 

The memory precision is either 1, 2, 4, or 8, and is implicitly specified in the opcode of 
the load/store instruction; it cannot come from a register. The base and index can come from 
registers or constants depending upon the instruction formats and opcodes used. 


2.9 Processor States 


The processor can be in one of three modes of operation (states) depending on the 
settings of some bits in the Processor Status register and whether a trap has just oc- 
curred. These states are called user mode, supervisor mode, and Trap State. User mode 
has the least privileges; all applications will normally run in user mode. Supervisor mode is 
intended for use by the operating system; it provides more capabilities than user mode, such 
as the ability to set more bits in the Processor Status register, the ability to write data that 
is read-only in user mode, and the ability to execute privileged instructions. 


Whenever a trap is taken, the processor enters Trap State, which provides all the 
privileges of supervisor mode, plus the ability to read and modify special internal state infor- 
mation which aids in trap diagnosis and recovery. 


Trap State, user mode, and supervisor mode are described more fully in the sec- 
tions on Processor Status and Traps, Interrupts, and Machine Checks. 


2.10 Virtual Memory 


Data addresses (produced by the address calculations of load/store instructions) and 
instruction addresses are called virtual addresses, and are the only type of address that an 
application programmer ever uses. The virtual memory system allows multiple users to co- 
exist in a common physical memory by providing hardware support for sharing between us- 
ers, protection from other users, and efficient execution of modern operating systems such as 


the UNIX® operating system.! 


In order to facilitate porting application programs from an all-32-bit environment, a 
Small Address Compatibility mode is provided. In this mode, all user program and data ad- 
dresses are assumed to be 32-bit addresses: the high-order bits of any address are set to 
zero and the low-order 32 bits are retained. This mode is controlled by the Small Address 
Compatibility Mode bit in the Processor Status register. 


The term reference is used throughout this manual to mean a memory operation 
(generally a load or a store) at a given address. Memory references can be made in either 
supervisor mode or user mode. The mode of the reference controls the mapping of the virtu- 
al addresses used in the processor to the physical addresses used in the memory system. 
Supervisor mode references, used by the operating system, employ a different mapping 
scheme than user mode references. 


T UNIX isa registered trademark of AT&T. 
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2.10.1 Address Mapping 


The address of an instruction or of a data operand in a load/store instruction is trans- 
lated to a physical address before being sent to the memory system. This process is known 
as virtual-to-physical address mapping. 


The mapping of a contiguous block of memory, called a page, is specified by a single 
entry in a table, called a page table or page map. Specifically, a page is a 64K-byte block of 
addresses starting at an address that is evenly divisible by 64K. This is equivalent to say- 
ing that the address of the first byte of a page has 16 low-order zeros in binary. There are 
separate page tables for instructions and for data. 


When a memory data reference is a supervisor mode reference (see the section on 
Data Mapping, below), the mapping from virtual to physical addresses is the identity func- 
tion. In other words, there is no distinction between virtual and physical addresses for su- 
pervisor mode data references. 


As part of the mapping process, a 48-bit virtual address is divided into two pieces. 
The high-order 32 bits are called the virtual page number and the low-order 16 bits are called 
the page offset (Figure 2-6) [2-4]. The virtual page number is mapped into a physical page 
number while the page offset remains the same. The physical page number and the page off- 
set are concatenated to produce the physical address. 


Virtual Page Number Page Offset 


(32) (16) 


Figure 2-6. Virtual Address Format 


2.10.1.1 Instruction Mapping 


The instruction page table consists of eight entries, individually loadable with the 
privileged instruction lipage, that specify the mapping of eight different virtual regions for us- 
er mode instruction references. In supervisor mode, instruction mapping is done in an 
implementation-dependent fashion [2-5]. In user mode, instruction mapping is done using 
the instruction page table. Figure 2-7 shows the format of an instruction page table entry 
(the operand to a lipage instruction) [2-6]. 


The N field, sometimes called the Instruction Page ID, specifies which of the eight 
instruction page table entries is to be affected. If the valid bit, V, is a zero, then page table 
entry N is invalid and the rest of the fields are ignored. If the valid bit is a one, then page ta- 
ble entry N maps a virtual page range into a physical page range. The virtual and physical 
page ranges begin with the pages specified by the Virtual Page Number and Physical Page 
Number fields, respectively [2-7]. These page ranges must always be aligned on a bound- 
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ary which is a multiple of their size. Note that an instruction virtual page number is architec- 
turally limited to 25 bits, and therefore instruction addresses are architecturally limited to 41 
bits. 


Virtual Page Number Physical Page Number 


(25) (32) 


Figure 2-7. Instruction Page Table Format 


The extent of the page ranges is determined by the size specifier, SS, field. It indi- 
cates how many contiguous pages are referenced by this instruction map entry. The interpre- 
tation of this field is implementation-dependent (Table 2-3) [2-8]. 


Note that it does not matter which page table entry (which value of N) is used for a 
particular mapping entry, but mapping a given virtual address with more than one page table 
entry will produce unpredictable results. Therefore, the virtual page ranges specified by the 
valid page table entries must be non-overlapping. 


Table 2-3. Instruction Page Table Size Specifier 


# of Pages Mapped 


0-7 See Implementation 
Dependencies, Appendix H 


2.10.1.2 Data Mapping 


It is intended that the operating system and applications program(s) occupy separate 
address spaces. Whether a reference is treated as a supervisor mode reference or a user 
mode reference is a function of the User Mode Load, User Mode Store, and User Protec- 
tion bits of the Processor Status register, as well as the type of reference (Table 2-4). 
When the processor is in user mode, it can only make user mode references to memory. But 
when the processor is in supervisor mode (and not in Trap State), it can make either user 
mode references or supervisor mode references depending on the settings of the User 
Mode Load and User Mode Store bits and the type of reference. This allows the operating 
system to perform memory references with the user’s address mapping. For example, if the 
operating system wanted to copy data into the user’s area at a virtual address provided by 
the user (such as in response to an I/O request), it could set the User Mode Store bit; load 
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instructions would then use the operating system’s address mapping, but store instructions 
would use the user’s address mapping and protection. Using the user’s protection prevents 
malicious users from providing invalid addresses. 


In Table 2-4, all load, eload, Idecc, Idnecc, loadcpu, and pcl instructions are consid- 
ered to make “load” (i.e., memory read) references, and all store, storecpu, dflush, and zcl 
instructions are considered to make “store” (i.e., memory write) references. The swat in- 
struction, which does both a memory read and a memory write, is considered to be both a 
load and a store. swat instructions require the User Mode Load and User Mode Store 
bits to have the same value or the results will be unpredictable. 


Table 2-4. Memory Reference Modes 


Load Store 
Reference Reference 
Mode Mode 


Supervisor | Supervisor 

Supervisor User 
User Supervisor 
User User 


User mode references always use the data page table to perform the mapping from 
virtual to physical addresses and to determine the legal access modes (read or write) and 
sharability of that page. Supervisor mode references do not use the data page table; both 
read and write supervisor mode references are always permitted. Supervisor mode refer- 
ences are always shared. Only shared pages participate in the multiprocessor cache-coher- 
ence scheme. Thus, all supervisor mode references and any user mode references which are 
designated as shared in the data page table are cache-coherent. 


The page control bits of a data page table entry (described below) control the types of 
access (read and write) that are allowed for user mode references to that page. In addition, 
they control whether the page can be shared among processes. For user mode references 
when the User Protection bit in the Processor Status register is off, the page control bits 
are ignored: both read and write references are permitted if either type of reference is permit- 
ted. This allows the supervisor to modify user data without the user’s access restrictions. 


Data page table information is stored differently from instruction page table entries. 
The data page table is a cache, while the instruction page table is a fully-associative memory 


[2-9]. 


Each page table entry contains the Process Key, a virtual tag, the physical page 
number, and a number of page control bits. The data page table is addressed by hashing the 
virtual page number being mapped and the Process Key field of the Processor Status reg- 
ister [2-10]. Therefore, the same virtual page numbers for two different processes (which, of 
course, must have different Process Keys) are probably mapped by different entries in the 
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page table. It is not necessary to evict one entire process to load the page table for another 
because the Process Key values stored with the page table entries are used to distinguish 
the entries for different processes. 


A data map miss trap occurs on a user mode reference when the virtual address cal- 
culated by a load/store instruction does not have a valid entry with the proper Process Key 
and address tags in the data page table. 


The Idpage instruction is used to load entries into the data page table. Figures 2-8 
and 2-9 show the formats of the operands to a Idpage instruction [2-11]. A table entry is 
written to map the given Virtual Page Number to the given Physical Page Number. The 
Virtual Page Number and the Process Key field of the Processor Status register are 
hashed to produce the data page table address to be written. 


Virtual Page Number Unused 


(32) (16) 


Figure 2-8. Virtual Page Number 


The R and W page control bits define the types of user mode references permitted to 
the page being mapped (unless overridden by clearing the User Protection bit of the Pro- 
cessor Status register; see below). If R is set, then load, eload, and pcl instructions are 
permitted. If W is set, then store, dflush, and zcl instructions are permitted. swat instruc- 
tions, which do both a read and a write, require both R and W to be set. If neither bit is set, 
the page is invalid and no reference to that page is permitted in user mode regardless of the 
setting of the User Protection bit. Note that referencing an invalid page will cause an ille- 
gal access trap, not a data map miss trap. 


The S bit indicates that the page is shared. The unused fields in the data page table 
format are reserved for future expansion and must be zero. An illegal access trap occurs 
when the user virtual address calculated by a load/store instruction does not have the proper 
page control bits for the requested reference. 


6 44 11 0000 
3 87 65 3210 


Physical Page Number 


(32) 


Figure 2-9. Data Page Table Format 
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2.11 Cache Effects 


Each K-1 processor contains two independent cache systems, one for instructions 
and the other for data. The caches are hardware managed, but without regard for whether a 
single data item exists in both caches at the same time. Therefore, when writing applica- 
tions that depend on the execution of data as programs, that write into the instruction 
stream, or that require that main memory contain an up-to-date copy of the data (such as I/O 
processing), the programmer is required to use instructions that manipulate the caches. 
These instructions can be used to remove data from either cache and/or force modified cache 
data to be written back to main memory. 


2.11.1 Cache Organization and Addressing 


The instruction and data caches are both one-way set-associative, also known as di- 
rect-mapped. They are addressed by low-order address bits [2-13] and contain program- 
mer-invisible tag fields that indicate the high-order address bits of the data that currently is 
cached at that location. Because they are one-way set-associative, two locations whose 
low-order address bits agree cannot reside in the same cache at the same time. 


In order to improve efficiency, data is organized in the caches in groups of consecutive 
addresses called cache lines (or sometimes just lines). Lines are the minimal tagged unit 
in the cache. Transfers between the memory system and a CPU are always of entire cache 
lines, but are broken up into smaller pieces called memory transfer sub-lines (or just sub- 
lines). A sub-line can be thought of as the largest unit that can be read from or written to 
the cache at one time (or as the width of the memory bus). For example, writing a cache line 
into the cache involves a series of writes of individual sub-lines. The size of cache lines and 
sub-lines are implementation-dependent, but are always powers of 2 [2-12]. Cache lines 
and sub-lines are always aligned on a boundary that is a multiple of their size. 


The instruction cache is addressed by physical addresses, while the data cache uses 
a combination of virtual and physical addresses. That is, the instruction cache is addressed 
and tagged by the physical address, whereas the data cache is addressed by the low-order 
bits of the virtual address, but is tagged with the physical address. This means that lines of 
data at the same virtual address in two different processes will compete for the same place 
in the data cache. 


An additional implication of this data cache organization is that shared data (data that 
is referenced by more than one process) should be addressed by all sharing processes with 
addresses that agree in sufficient low-order virtual address bits to ensure that they refer- 
ence the same location in the data cache. (If this is not done, then the same data item could 
potentially exist at more than one place in the data cache, and the locations would not be 
guaranteed to contain the same data.) The number of low-order address bits that must 
agree is dependent upon the size of the data cache. In order to allow for future increases in 
the size of the data cache, processes should share data at addresses that agree in the low- 
order 27 bits (allowing for a maximum data cache size of 128M bytes). 


It is also possible for processes to share data at arbitrary virtual addresses. Howevy- 
er, this requires that the operating system “‘sweep” the cache after running such a process. 
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It also requires that two such sharing processes never be run simultaneously on two differ- 
ent processors in a multiprocessor system. Naturally, this will result in some performance 
penalty. 


The sizes of the instruction and data caches are implementation-dependent [2-13]. 


2.11.2 Instruction Stache 


The K-1 provides an additional level of instruction caching called the instruction 
stache. This much smaller and faster cache lies between each CPU and its instruction 
cache. The instruction stache is addressed by low-order instruction address bits, but instead 
of recording the entire physical page number, it remembers only the Instruction Page ID 
(the N field in an instruction page table entry). This means that the entire instruction stache 
must be invalidated whenever the instruction page table is changed. The entire instruction 
stache may be invalidated by executing an iskill instruction, or by entering or exiting Trap 
State (i.e., by trapping or by returning from a trap). 

The line size of the instruction stache is not necessarily the same as that of the in- 
struction cache. The total size and the line size of the instruction stache are both implemen- 
tation-dependent [2-14]. 


2.11.3 Cache Coherence 


The K-1 implements a multiprocessor cache-coherence scheme that allows CPUs in 
the same system to share data without software knowing about the effects of caches. That 
is, if one processor (A) modifies some data, cache coherence guarantees that the next pro- 
cessor (B) to read that data will see the modified value (even if the old value was in B’s da- 
ta cache prior to A’s write). It is still necessary for software running on different CPUs to 
synchronize the modification of shared data. Otherwise, two CPUs might modify the same 
data at nearly the same time. Since there is no guarantee (without synchronization) which 
CPU would modify the data first, the result would be unpredictable. 


The K-1 cache-coherence scheme is handled through the memory system. It sup- 
ports a single-writer, multiple-reader model. That is, any number of processors can have a 
read-only copy of a particular piece of shared data in their data caches at the same time. But 
as soon as one processor tries to modify shared data, it must be granted sole ownership. 
When another processor tries to read the modified result, the writer will lose its sole owner- 
ship (and its write privileges). 

The K-1 distinguishes between shared data and non-shared data; only shared data 
participates in the cache-coherence scheme. As explained in the section on Data Mapping 
above, all supervisor mode references are shared; user mode references are shared only if 
they are marked as shared in the data page table. 


The instruction caches do not participate in the cache-coherence scheme. Cache co- 
herence between instructions and data must be provided by software, even between the in- 
struction cache and the data cache of a single processor. 
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The I/O system does not participate in the cache-coherence scheme and because of 
this, care must be taken when writing low-level code. Before initiating an I/O write, the data 
being written must be flushed from the data caches of all processors. Before an I/O read 
completes, the old “stale” data in all the processors’ caches must be flushed (so that the 
new data can be read from memory). A special shared version of the dflush instruction is 
provided for this purpose. This shared dflush is “broadcast” to all the CPUs in a multipro- 
cessor system, and can flush a given cache line from all processors’ data caches at once. 
(See the dflush instruction description for more details.) 


2.11.4 Interprocessor Synchronization 


The K-1 has two semaphore schemes to allow different processors to synchronize 
their activities. First, the swat instruction is an atomic operation on shared memory. Sec- 
ond, the I/O system implements a limited number of registers whose bits can be atomically 
set or cleared with the wios instruction. Both of these methods can be used to implement bi- 
nary semaphores, which can be used for interprocessor synchronization. 


The two independent synchronization schemes use different system resources. 
While the swat instruction allows the implementation of a huge number of semaphores 
(limited only by the size of memory), its use of the memory system (which is optimized to 
transfer large amounts of data) can make it slow. swat instructions must also “compete” 
for memory bandwidth with other operations. The I/O system implements a smaller number 
of semaphores, but they are faster to access and their use does not consume memory band- 
width. The semaphores used most frequently by the operating system should be implement- 
ed in the I/O system. See Appendix F for more details on the I/O system. 


2.12 Program Execution 


K-1 instructions are either 32 or 64 bits long and reside in main memory at addresses 
that must be properly aligned; a 32-bit instruction must start at an address evenly divisible 
by 4 and a 64-bit instruction must start at an address evenly divisible by 8. Program execu- 
tion involves the repeated fetching of 64-bit instruction words, and the issue of instructions 
from them. (Instruction words start at an address that is evenly divisible by 8.) 


Each instruction word contains either two 32-bit instructions or one 64-bit instruc- 
tion. If there are two 32-bit instructions packed into an instruction word, they are designated 
as I0 and I1, with I0 occupying the most significant half (lower address) and I1 the least sig- 
nificant (higher address). If there is one 64-bit instruction in an instruction word, it is desig- 
nated as IO, and there is no I1 instruction. 


The byte address of the current instruction being fetched is contained in the Program 
Counter (PC), which is incremented by 4 or 8 depending upon the size of the instruction be- 
ing executed. Programs are executed sequentially until a trap condition or a branch instruc- 
tion is encountered. Branch instructions select a new program counter either as an absolute 
address or as a signed offset from the address of the instruction word containing the branch 
instruction. If a new program counter points to an [1 instruction (i.e., its bit 2 is set) then the 
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entire 64-bit instruction word will be fetched, but only the I1 instruction in that word will be 
executed. 


There are two disable bits, D0 and D1, associated with each PC. The DO bit inhibits 
the execution of the IO instruction, and the D1 bit inhibits the execution of the I1 instruction. 
Normally, when an instruction word is first fetched, both the DO and D1 bits will be clear. 
The DO bit is set after the I0 instruction has executed, and the D1 bit is set after the I1 in- 
struction has executed. For 64-bit instructions, both bits are set when the instruction exe- 
cutes. Note that a branch to an I1 instruction actually branches to the IO instruction (since 
the K-1 always fetches 64-bit instruction words), but sets the DO bit, disabling the I0 half of 
the instruction word. The DO disable bit can thus be seen to be the same as bit 2 of the pro- 
gram counter. The disable bits are used primarily in the features of the architecture relating 
to trapping. When an instruction word is fetched from a PC with both disable bits set (as 
can happen due to delayed branching or exts instructions), instruction map miss traps are 
suppressed. 


2.12.1 Delayed Branching 


Branch instructions have the ability to disable the execution of instructions from the 
subsequently-fetched one or two 64-bit words. This is done by specifying the conditions un- 
der which the two instruction words following a branch will be executed: in case of branch, in 
case of fall-through, or always (in case of branch or fall-through). The execution of a few 
more instructions after a branch can be thought of as delaying the actual branching (as op- 
posed to delaying the branch decision). Delayed branching allows compilers to compensate 
for the execution time penalty incurred by branches in implementations of the architecture. In- 
structions whose execution is controlled by a branch are called delay instructions. The first 
64-bit word fetched following a branch instruction is called the first delay slot, and the 64- 
bit word fetched following that is called the second delay slot. 


Branch instructions control whether or not instructions that occupy the first and sec- 
ond delay slots are to be executed. For each delay slot there are three cases: execute the in- 
structions in that slot if the branch is taken (branch), not taken (fall-through), or regardless 
of whether the branch is taken (always). The letters b, f, and a are used to refer to these 
cases, respectively. Of the nine possible independently controllable cases for two delay 
slots, eight may be specified by the delayed execution control field (DC) of a branch instruc- 
tion (Table 3-2). The ninth case, ba, indicates that the instructions in the first slot are exe- 
cuted only on the branch path, while those from the second slot are executed always (on both 
paths). This is thought to be the least useful case, and is practically equivalent to the ab 
case, which is provided. 


There are no restrictions on the types of instructions that can be executed as delay in- 
structions. In particular, another branch instruction may be executed as a delay instruction. 
In order to clearly describe the sequence of instructions that will be executed in such a case, 
it is necessary to use a slightly unusual model of how program branching works. The key 
rule to remember is that the execution of each 64-bit instruction word determines the 
address of the third following 64-bit instruction word to be fetched. It is also important to 
know that all delay instructions are fetched, even if they are not going to be executed. Such a 
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nullified delay instruction word is effectively disabled by asserting both the DO and D1 dis- 
able bits associated with that instruction’s PC. 


When the first in a sequence of branch instructions is encountered, the addresses of 
the next two instruction words have already been determined. If the branch is taken, the ad- 
dress of the third following instruction word fetch will be the branch target address. Howev- 
er, it is the execution of the first delay instruction that controls the fetch address to be used 
immediately after the target of the branch (the fourth following instruction word fetch). 


As an example consider the following program: 


jump XYZ, aa 
add =%r0,%rl,%r2; ~— br QQQ, bb 
nop 


C2 Ot 


AYZ: move %rl,%r2 


QQQ: 


The sequence of instruction fetches will be from addresses A, B, C, XYZ, QQQ, and will 
then continue with QQQ+8, QQQ+16, etc. Note that the delay instructions of a branch are 
not necessarily located in the addresses sequentially following the branch. In the above ex- 
ample, the first delay slot for the branch to QQQ is at C, but the second delay slot for that 
branch is at XYZ. 


2.12.2 Conditional Execution 


Most of the K-1 instruction formats allow the specification of a conditional execution 
field: instructions can be conditionally executed based on the value of a flag. Since flag f7 is 
always true, instructions can always be unconditionally executed. The benefit of conditional 
execution is that in many cases branches can be avoided by conditionally executing the code 
that would have been jumped around. For a small number of conditionally-executed instruc- 
tions, this is more efficient than branching; the exact break-even point depends upon the spe- 
cific instruction timings and interlocks. (See Appendix C for more details on these topics.) 
Conditionally-disabled instructions will never trap, even in cases such as illegal instruc- 
tion/privilege violation traps. Note that traps that occur before an instruction is interpreted, 
such as instruction page map misses and trace traps, will still affect conditionally-disabled 
instructions. 


2.13 Early Load 


In order for compilers to produce highly efficient code despite the latency of load in- 
structions, they must be able to move the load instructions “backwards” in the code, i.e., 
well before the use of the load’s result. However, a load may have the serious side effect of 
trapping. If a load instruction was moved before a check for an illegal address, for example, 
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the load result would probably have been ignored after the check and so an illegal address 
trap would have been superfluous and should have been suppressed. 


The early load mechanism is designed to solve this problem. For each load instruc- 
tion there is a corresponding eload instruction. The eload instruction performs the same ad- 
dress calculation and mapping function as the corresponding load, and, if there is no trap, 
loads the proper value into the destination register. eloads, however, behave differently if 
there is an error. For illegal access errors, illegal address errors (when the Early Load 
Alignment Trap bit in the Processor Status register is not set), and supervisor mode non- 
existent memory errors, the eload will not trap but will instead store zero into the destina- 
tion register and set the ELF flag corresponding to that register. For all other types of er- 
rors, eload instructions behave the same as the corresponding load. In this case, it is then 
up to the operating system to either abort the program or set the result and ELF flag accord- 
ingly and resume its execution. 


The ELF flag is cleared by any load that does not trap, and set by any load that does 
trap. eload instructions set the ELF flag identically to the corresponding load, whether the 
eload traps or not. A load that might have an unwanted side effect should be replaced with 
an eload. 


In order to verify that the results of an eload are indeed valid, the programmer may 
use the echk instruction. This instruction tests one ELF flag and traps if it is set. It should 
be noted that echk instructions may be omitted for efficiency; correct programs will always 
work either with or without echk instructions, although the behavior of incorrect programs 
can be radically different. 


In Trap State (see below), load instructions do not affect the ELF flags and eload 
instructions act the same as loads. The relf and welf instructions can be used either in or 
out of Trap State to save and restore the ELF flags. 


2.14 Processor Status 


Much of the K-1 processor’s operation is controlled by a register called the Proces- 
sor Status register. The rps, wps, srm, and spl instructions read and write the Processor 
Status register. In user mode, only the high-order 32 bits of the Processor Status register 
can be modified. The Processor Status register is overridden while in Trap State and a de- 
fault value for some of the bits is used. The format of the Processor Status register is given 
in Table 2-5. Unused bits of the Processor Status register read as zero, but application 
programs must not rely on this fact since this may not be the case in other versions of the K- 
1 architecture. Attempts to set (write a ‘1’ into) unused bits of the Processor Status regis- 
ter cause a trap [2-16]. 


The Processor Version Number<7..0> field is a read-only field giving the imple- 
mentation level for this model of the K-1 processor. It may be used by operating systems to 
tailor code for specific implementations. 


The f<6..0> field refers to the six flag bits f6 through f0 (where f<6> refers to f6, 
etc.). These bits are primarily set and cleared by compare and flag instructions and tested by 
conditional branch instructions. They can also be used to control conditional execution in 
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most instruction formats. Flag f7 is always set and is therefore not part of the Processor 
Status register. 


Table 2-5. Processor Status Register 


Rone Act as if 0 in 
apenen Trap State 


Processor Version Number<7..0> 
f<6..0> 

Arithmetic Exception Flags<4..0> 
Arithmetic Trap Enables<4..0> 
Integer Divide Trap Enable 
Rounding Mode<1..0> 

Byte Order Low-to-High 


Process Key<12..0> 

Trace Enable 

Trace Pending 

User Protection 

User Mode Store 

User Mode Load 

Small Address Compatibility Mode 
Early Load Alignment Trap 
Processor Priority Level<3..0> 


Table 2-6. Arithmetic Trap Enables and Exception Flags 


} Bit | Trap/Exception Type 


Invalid Operation 


Division by Zero 
Floating-Point Overflow 
Floating-Point Underflow 
Inexact Result 


If certain exception conditions are detected during floating-point calculations, then 
one or more of the Arithmetic Exception Flags will be set, and an arithmetic trap may occur 
depending upon the Arithmetic Trap Enables<4..0> field. These five bits independently en- 
able floating-point exceptions to trap as shown in Table 2-6. Setting Arithmetic Exception 
Flags or Arithmetic Trap Enables with the wps (write processor status) instruction will 
not cause a trap. The Arithmetic Exception Flags can be cleared only with the wps instruc- 
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tion. Note that floating-point instructions issued in Trap State can affect the Arithmetic Ex- 
ception Flags. If preservation of the state of the trapping program is desired, then care must 
be taken in Trap State to save the Processor Status register before issuing any floating- 
point instructions. 


Although the K-1 supports floating-point traps, they are not IEEE compatible. It 
should be noted, however, that the IEEE Floating-Point standard does not require any sup- 
port for traps. 


The Integer Divide Trap Enable bit controls the ability of integer divide instructions 
(divsst and divssr) to generate traps when division by zero is attempted. If this bit is not 
set, then integer divides will not trap on division by zero and will just return 0. 


During floating-point calculations, it is sometimes necessary to store an inexact re- 
sult. The process by which the exact answer is transformed into an inexact result is called 
rounding. Four different methods of rounding are available as controlled by the Rounding 
Mode<1..0> field and described in Table 2-7. The srm instruction is provided to change the 
Rounding Mode field of the Processor Status register. 


Table 2-7. Rounding Modes 


Field Value | Rounding Mode 


Round to Nearest Nearest 


Round Towards Zero Truncate 
Round Towards Positive Infinity Ceiling 
Round Towards Negative Infinity Floor 


The Byte Order Low-to-High bit controls the low-order address bits generated dur- 
ing load/store instructions. If this bit is set, then the low three address bits (2..0) are trans- 
formed based on the precision of the load/store instruction (Table 2-8). Supervisor mode 
references are not affected by the setting of the Byte Order Low-to-High bit. 


The Process Key<12..0> field distinguishes the virtual page numbers of the current- 
ly running process from the identically-numbered pages of another process in the data page 
map. 

The Trace Enable and Trace Pending bits control trace trapping. Trace Enable en- 
ables the setting of Trace Pending after the execution of the current instruction. Trace 
Pending actually causes a trace trap. These bits can be used to single step a program, or to 
advance a program past a breakpoint. 


The Trace Enable bit can only be cleared or set by a wps instruction when the pro- 
cessor is in Trap State. The Trace Pending bit can only be cleared by a wps instruction 
when the processor is in Trap State, and can be set by a wps instruction in Trap State. 
But most importantly, the Trace Pending bit is set whenever a non-Trap State instruction 
is executed while the Trace Enable bit is set. A trace trap occurs when Trace Pending is 
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set, the processor is not in Trap State, and the instruction about to be issued has not been 
disabled by a branch or by exts. (Trace Pending will cause a trace trap on an instruction 
which is disabled by conditional execution.) 


Table 2-8. Low-to-High Byte Addressing 


Input <2..0> Output <2..0> 


Bile 


NADUNAPWNK © 
Ornwhkunanan 


ARNO 
ON FD 


cee ee 


4-byte 0 4 
4 0 


a 


The User Mode Load and User Mode Store bits control whether memory referenc- 
es are treated as user mode references or supervisor mode references. This topic is dis- 
cussed in the section on Data Mapping, and is illustrated in Table 2-4. Note that the Byte 
Order Low-to-High and Small Address Compatibility Mode bits affect only user mode ref- 
erences — both bits are disabled in supervisor mode. Since the User Mode Load and User 
Mode Store bits always act as zero in Trap State, there can only be supervisor mode ref- 
erences in Trap State. 


The User Protection bit controls whether user mode references use the protections 
provided in the data page table, or whether they get supervisor “over-ride” privileges. This 
topic is discussed in the section on Data Mapping. 


The processor is considered to be in user mode only if all three of the User Mode 
Load, User Mode Store, and User Protection bits are on. If any of these bits is off, the 
processor is in supervisor mode. Any attempt to switch from supervisor mode to user 
mode other than when in Trap State will produce unpredictable results. Note that even 
when the processor is in supervisor mode (and not in Trap State), it can still make user 
mode references to memory by controlling which of the User Mode Load, User Mode 
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Store, and User Protection bits are set. Instruction references are in user mode only if the 
processor is in user mode (all three of the User Mode Load, User Mode Store, and User 
Protection bits are on). 


The Small Address Compatibility Mode bit forces both user mode instruction and us- 
er mode data virtual addresses to be truncated to 32 bits. The higher-order bits of any user 
address will be set to zero. This mode is provided to ease the burden of porting programs 
from a 32-bit environment. Supervisor mode references are not affected. 


The Early Load Alignment Trap bit enables eload instructions to trap on misaligned 
address errors (instead of just setting an ELF flag). The normal operation, if this bit is clear, 
is for eloads to “ignore” the alignment error (setting the ELF flag and returning 0). If this 
bit is set, however, an eload with a misaligned address will take an illegal address trap (as 
would the corresponding load instruction). 


The Processor Priority Level<3..0> field gives the interrupt priority level of the 
processor. All interrupts have an associated priority level. If the priority level of an interrupt 
is higher than the priority level of the processor, then the interrupt will happen. If the inter- 
rupt priority is the same or lower than the priority level of the processor, then the interrupt 
will remain pending. The interrupt will occur as soon as the processor lowers its priority to 
be less than that of the interrupt. The Processor Priority Level field is encoded with zero 
as the highest priority level (masking out all interrupts), and fifteen as the lowest level 
(allowing all interrupts). The spl instruction is provided to change the Processor Priority 
Level field of the Processor Status register. 


2.15 Input/Output 


I/O controller(s) perform input/output via direct memory access over a number of high- 
speed I/O busses. The sequencing of I/O operations is controlled by messages in memory 
and by interrupts. The rios and wios instructions are provided to exchange control and sta- 
tus information between a CPU and the I/O subsystem. 


2.15.1 I/O Interrupts 


Interrupts from the I/O system are assigned priority levels from zero to fifteen, with 
zero being the highest priority. The level of the highest-priority, pending I/O interrupt is 
compared with the Processor Priority Level<3..0> field of the Processor Status register 
and, if the I/O interrupt has higher priority (i.e., is numerically less than the processor’s prior- 
ity), the processor will perform an I/O interrupt sequence. The Trap Summary contains the 
level of the highest-priority pending I/O interrupt. All I/O interrupts are disabled in Trap 
State. (See the section on Traps, Interrupts, and Machine Checks, below.) 


2.16 Timers 


The K-1 has an Uptime Counter and an Interval Timer register, whose formats are 
shown in Figures 2-10 and 2-11. The Uptime Counter continuously counts the number of cy- 
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cles since the processor was last reset. (See Appendix C for a description of processor cy- 
cles.) The Interval Timer register contains an interval field, that is constantly being com- 
pared with the low-order 32 bits of the Uptime Counter. If there is a match and the clock in- 
terrupt enable (CIE) bit is set and the processor is not in Trap State, then an interval timer 
interrupt will be generated; this interrupt has priority level 0. If the processor is in Trap 
State, the interrupt is held pending until the processor exits Trap State. Generation of an 
interrupt effectively clears the CIE bit; the Interval Timer register must be loaded again if an- 
other interrupt is desired. 


6 0 
3 0 


Uptime 
(64) 


Figure 2-10. Uptime Counter 


Interval 


(32) 


Figure 2-11. Interval Timer Register 


The rut instruction can be used to read the Uptime Counter, and the wit instruction 
can be used to write the Interval Timer register. 


The size of the fields in the Uptime Counter and the Interval Timer register are imple- 
mentation-dependent [2-15]. 


2.17 Traps, Interrupts, and Machine Checks 


A variety of exceptional conditions that prevent further progress may arise during the 
execution of a program and cause a trap to occur. During a trap, certain processor state in- 
formation is saved and execution is transferred to a different address with (possibly) differ- 
ent processor status. Some of the exceptional conditions that can cause traps are data map 
misses, I/O, Interval Timer, and Console interrupts, machine checks and the execution of cer- 
tain instructions whose purpose is to cause a trap. When a trap occurs, the processor enters 
Trap State. 


With the exception of load/store unit instructions, all instructions that write results 
will still do so even if they trap. The results written, however, may be different depending on 
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whether there was a trap or not. See Chapter 5 for a description of the effects of traps on 
load/store unit instructions, and Appendix I for a description of the results returned by float- 
ing-point instructions in the presence or absence of traps. 


The K-1 architecture does not guarantee that instructions will be completed in the or- 
der in which they are encountered in the program. The order of operations may be changed if 
it does not affect the results of the calculations. While this is normally completely invisible 
to the programmer, it can become visible when a trap occurs. 


While execution can be continued after a trap, the imprecise nature of most traps 
means that results can not always be repaired before returning from the trap. In more detail, 
traps occurring before instruction issue (instruction map miss, illegal instruction/privilege vio- 
lations, trace traps, interrupts, trap instructions such as bpt, trap, etc.) are precise, and 
traps occurring after instruction issue (floating-point exceptions, data map miss, memory er- 
rors, etc.) are imprecise. A precise trap implies that execution has not proceeded past the 
instruction that trapped. With imprecise traps, some number of instructions after the trap- 
ping instruction may have been issued and completed. For example, if a floating-point multi- 
ply instruction traps, several instructions after the multiply may have issued and completed 
before the multiply trap is detected (suspending further instruction issue). The instructions 
following the multiply may have altered the input operands of the multiply, making recovery 
from the trap difficult, if not impossible. Furthermore, instructions that use the result of the 
trapping instruction could have been issued before the trap suspended instruction issue. 


Imprecise traps are a consequence of a highly pipelined machine with moderate func- 
tional unit latencies. Precise traps can be guaranteed in software by not modifying the oper- 
ands of an instruction and by not using its result until after any trap it could have issued has 
taken affect. (The special long constant form of the nop instruction is useful in this context — 
it can be used to wait for various events such as the completion of non-fixed-latency instruc- 
tions). See Appendix C for a discussion of functional unit latencies. 


When a K-1 CPU decides to trap, all new instruction execution is suspended. A sum- 
mary of all the “simultaneously” occurring traps is accumulated, as well as specific informa- 
tion about each one. This Trap Summary is explained in Appendix D. 


Regardless of the cause of a trap, certain restart information, referred to as the Re- 
start PCs, is also accumulated. The Restart PCs (Figure 2-12) are the first three instruc- 
tion addresses to fetch to resume execution of the program that trapped, and contain execu- 
tion disable bits to possibly void the execution of instructions fetched at those addresses 
(due to the effects of prior delayed branches and previously issued instructions). For both 
precise and imprecise traps, the Restart PCs allow the program to be resumed from the 
point at which execution was suspended. The exact location of the restart point depends up- 
on the cause of the trap. (See the description of decode traps, below and in Appendix D.) 
Three Restart PCs are necessary because of the K-1 architecture’s two-cycle delayed 
branching. 


In a Restart PC, the Restart IWA (Instruction Word Address) field contains bits 
40..3 of the address of an instruction word [2-3]. D0 and D1 are the disable bits associated 
with the instructions at address Restart IWA*8. If a disable bit is set for a particular Re- 
start PC, the corresponding IO or I1 instruction will be disabled when it is fetched. It is pos- 
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sible that both DO and D1 will be set, indicating that no instructions from that word are to be 
executed upon restart. 


6 44 44 0000 
Es 87 10 3210 


Unused Reserved Restart IWA 


(16) (7) (38) 


Figure 2-12. Restart PC 


Traps from load/store instructions, although imprecise, are treated specially. In order 
to support paging, traps from load/store instructions must be recoverable. A further compli- 
cation is that load/store instructions are also executed serially, and at the time of a trap, 
there may be quite a few load/store instructions already in the pipeline. To recover from an 
imprecise load/store instruction trap, it is necessary to simulate in software all of the 
load/store instructions that entered the pipeline before the trap suspended further instruction 
issue. All of the information needed to simulate these instructions is contained in a memory 
called the load/store queue. A copy of the contents of the load/store queue is frozen upon 
entering Trap State, and can be stored to memory with the slstrpd (store load/store trap da- 
ta) instruction. (See Appendix D for more information on recovering from load/store traps.) 


2.17.1 The Trap Sequence 


At the start of a trap sequence, the K-1 saves certain trap recovery information, 
which can be read with the rtrpd and slstrpd instructions, in internal memories in the CPU. 
This trap data includes the Trap Summary, a set of Trap Locators for each pipeline stage 
of each functional unit, the load/store queue, and the three Restart PCs. Next, the proces- 
sor enters a special state called Trap State. In Trap State, a number of fields in the Pro- 
cessor Status register act as if they are zero, except that a rps instruction reads out the cor- 
rect processor status. The affected fields are indicated in the last column of Table 2-5. In 
Trap State, as in supervisor mode, instruction mapping is done in an implementation depen- 
dent fashion [2-5]. Any trap while in Trap State will cause the system to halt and the Con- 
sole to be notified. 


There are five types of CPU-internal traps that can occur in any combination: decode 
traps, instruction fetch traps, integer overflow or check traps, floating-point traps, and 
load/store traps. In addition, a special CPU-internal trap, called a reset trap, occurs at sys- 
tem initialization time. (See the section on Reset Operation.) 


There are five types of external interrupts that are recognized by the K-1: NMI is a 
non-maskable interrupt that indicates that an over-temperature condition or other Console 
panic has been detected; CKI indicates an interval timer interrupt; RPI and WPI are inter- 
rupts that indicate that the Console interface’s read port and write port are full and empty, re- 
spectively; and IOI indicates an I/O system interrupt. Table 2-9 lists the priority level corre- 
sponding to each external interrupt. 
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Lastly, machine checks are error conditions that, in a correctly operating machine, 
should never occur. Unlike traps or external interrupts, they are more indicative of compo- 
nent, connector, or design failures, possibly of an intermittent or one-time nature. A number 
of units within the K-1 CPU cooperate to detect such conditions and facilitate graceful (if pos- 
sible) recovery. 


Table 2-9. External Interrupt Priority Levels 


Interrupt Type Priority Level 


T Non-maskable (except in Trap State), 
Processor Priority Level ignored. 


All decode traps, instruction fetch traps, and external interrupts are considered pre- 
cise traps since they are detected prior to instruction issue. All other traps and machine 
checks are imprecise traps as they occur after instruction issue. The Trap Summary reports 
all traps, interrupts, and machine checks that may have occurred during the time the CPU 
was not in Trap State. 


The format of the Trap Summary, the Trap Locators, and the load/store queue are 
implementation dependent; please see Appendix D for more information on these structures. 

To begin the Trap Sequence, the processor first vectors to either address 0 or 128 as 
determined by the type of trap (Table 2-10). General registers may be saved in memory us- 
ing store instructions. (The special storecpu instruction is provided for this purpose.)  Fol- 
lowing that, the processor internal state may be moved to the registers with rtrpd instruc- 
tions, and load/store state may be saved with slstrpd instructions. 


Table 2-10. Trap Vectoring 


Trap Type Vector Address 


Reset 
All other traps 


128 
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A return from a trap is accomplished by reloading the three Restart PCs with three 
exts instructions. Since these instructions must be executed while the processor is in Trap 
State, the operating system must first cause a trap (using the xtrap instruction, for exam- 
ple) to enter Trap State. Appendix D gives an example of trap state software. 

Note that using exts with a delayed execution control field that disables either of the 
following two instructions will produce unpredictable results. Also, while the hardware 
would never produce a Restart PC with I0 enabled and I1 disabled, returning from Trap 
State with such a PC is legal and produces the obvious result (I0 will get executed and [1 
will not). 


Table 2-11. DO and D1 Decoding for 32-bit Instructions 


Trapping Instruction 
Instruction Executed Next 


none 

not possible not ie 
10 Il 

I0 and/or it t 


' Decode traps can only be caused by I1 in this case. 


* both 10 and I have finished - apply same decoding to next PC 


Table 2-12. DO and D1 Decoding for a 64-bit Instruction 


Trapping Instruction 
Instruction Executed Next 


none 
not possible 
not possible 
10 


not ot 
not possible 


$ 


$ IO has finished - apply same decoding to next PC 


For decode traps and instruction fetch traps, which are mutually exclusive, the first 
Restart PC’s DO and D1 bits give an indication of which instruction(s) caused the trap or 
would have been executed next had there not been a trace trap or an instruction fetch trap. 
These bits should be interpreted according to Tables 2-11 and 2-12. For the bpt, trap, 
strap, and xtrap instructions, and for an illegal instruction/privilege violation, the disable bit 
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of the offending instruction will be set. Therefore, upon return, the same instruction word will 
be fetched, but the offending instruction will be disabled. For breakpoints this is not appropri- 
ate. In this case, the original instruction should be put back and the disable bit corresponding 
to the bpt instruction should be cleared. To be able to unambiguously determine which half of 
the instruction word executed the breakpoint, all bpt instructions must use a 32-bit format. 


2.18 Reset Operation 


When the K-1 is reset, the Processor Status register is undefined, Trap State is 
entered, and processing begins with a reset trap. Reset traps are distinguished from other 
traps by having a different vector address. In the case of a reset trap, all other trap informa- 
tion should be ignored. 


After first applying power to the K-1, the state of the memory, caches, and page ta- 
bles will be undefined, and in fact may contain parity or other uncorrectable errors. The mem- 
ory and data cache should be cleared using zcl and dflush instructions, the instruction cache 
should be cleared using ickill instructions, and the instruction and data page tables should be 
cleared using lipage and Idpage instructions. The ELF flags and the general registers must 
also be initialized. Bootstrap programs that initialize the state may be loaded by the Con- 
sole into a small section of the instruction cache. The Console may leave some interesting in- 
formation in certain general registers upon start-up. See Appendix G for details of the reset 
state of the machine. 
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The K-1 architecture has been designed with simplicity and regularity as important 
goals. This is most evident in the way K-1 instructions have been organized. For each pos- 
sible operation there is a corresponding unique opcode. For each opcode, a number of dif- 
ferent instruction formats are available. These formats differ in their lengths and in terms of 
what operands they provide to the functional units. A given opcode always performs the 
same operation regardless of which instruction format is used. It is not always true, howev- 
er, that a given opcode uses all of the fields available in a given format. 


PC-relative branches use a special format that does not require an opcode. Other 
types of branches require an opcode and are restricted to using specific formats. 


In the individual instruction descriptions in the following chapters, the functional unit 
that executes the instruction is explicitly named. The descriptions refer to the register file, 
flags, and ELF flags as if they were arrays, R[], F[], and ELF[]. Opcodes are given in hexa- 
decimal. Timing information can be found in Appendix C. The memory access function 
mem(size, paddr) and the address mapping function map(addr), used in the descriptions of 
the memory referencing instructions, are described at the beginning of Chapter 5S. 


3.1 Instruction Formats 


The K-1 architecture provides eight different instruction formats. Four of these are 32- 
bit, and four are 64-bit formats. The long (64-bit) formats provide long constants and target 
addresses for non-PC-relative branches. 32-bit instructions must be aligned on a 4-byte 
boundary, and 64-bit instructions must be aligned on an 8-byte boundary. 


This chapter makes use of the terms “IO” and “I1”, which were defined in Chapter 
2. An IO instruction resides in the most significant (lowest address) half of a 64-bit instruc- 
tion word (at a byte address that is evenly divisible by 8). I1 instructions reside in the least 
significant (highest address) half of a 64-bit instruction word (at a byte address that is con- 
gruent to 4 modulo 8). All long (64-bit) formats are legal only as IO instructions. One of the 
32-bit formats, the PC-relative branch format, is legal only as an I1 instruction. 


The instruction format specifies the origin of operands and the location where the re- 
sult is to be stored. Data operands to instructions are given the names srca, srcb, srcc, and 
fsrc. Not all instructions require all of these operands. The result, if any, is stored in a regis- 
ter, rdst, or a flag, fdst. 


Most formats specify a 2-bit format control field, FC, an 8-bit opcode field, OP, and a 
6-bit register address/flag number field, RC. srcc and rdst are specified by R{RC], fdst is 
specified by FIRC, gl (where RC, 9 is the low three bits of the RC field), and fsrc is speci- 


fied by FLOP, I (where OP, 9 is the low three bits of the opcode). srca and srcb are speci- 
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fied according to Table 3-1. In this table, as elsewhere, the term “short constant” format is 
used to refer to either the unconditional short constant format or the conditional short con- 
stant format. 


Table 3-1. Source Operand Control 


R[RA] R[RB] $ Register 
R[RAB] , S(SCON) Short constant 
S(SCON) R[RAB] $ Short constant 


R[RAB] $ S(LCON) Long constant 
S(LCON) R[RAB] Long constant 
- PC-relative branch 
- Absolute branch 
R[RA] Register branch 
R[RA] exts 


The register format (Figure 3-1) has an FC field with the binary value 10. In addi- 
tion to the standard RC field, this format contains two additional register address fields, RA 
and RB, that are the register addresses of srca and srcb, respectively, and two fields, I and 
FLG, that control conditional execution of the instruction. If the I field is a zero, the instruc- 
tion is executed only if the flag addressed by the FLG field is set (has the value one). If the 
I field is a one, the instruction is executed only if the addressed flag is clear. By specifying 
flag f7, which is always set, and I = 0, the instruction can be unconditionally executed. Using 
the register format, an instruction can have two register operands and can write its result to 
a third register. Some instructions read a third operand instead of, or in addition to, writing a 
result. 


Figure 3-1. Register Instruction Format 


eae tn ite dace , 
There are two versions of the short constant format; FC<0> controls whether this 
format is conditionally executed 


+ S(x) performs format-dependent extension and shifting of a constant to 64 bits. 
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The short constant format has two forms — unconditional and conditional (Figures 3- 
2 and 3-3). Both are similar to the register format except that an immediate, signed con- 
stant, SCON, can be used in place of either of the register operands. The RA field is re- 
named to RAB because R[RAB] can be directed to either srca or srcb under the control of 
the CA field (Table 3-1). If CA is a zero, then R[RAB] is srca and the constant is srcb. If 
CA is a one, then R[RAB] is srcb and the constant is srca. If FC<Q> is a 0, then the instruc- 
tion is in unconditional short constant format (Figure 3-2); if FC<0> is a 1, then the in- 
struction is in conditional short constant instruction format (Figure 3-3). The uncondition- 
al short constant format provides a 9-bit signed short constant (SCON9); the conditional 
short constant format provides a 5-bit signed short constant (SCONS), but allows condi- 
tional execution as in the register format. 


3.32 
109 


NN 
mh 


Figure 3-2. Unconditional Short Constant Instruction Format 


100 000 0 
098 432 0 


Es SCONS5 |I| FLG 
(6) 


Figure 3-3. Conditional Short Constant Instruction Format 


The long constant format (Figure 3-4) is the first of the 64-bit formats; it has an FC 
field with the binary value 11. The long constant format is similar to the conditional short 
constant format with the SCON field replaced by the 36-bit LCON field. As in the short 
constant formats, the CA field determines the ordering of R[RAB] and the constant (in this 
case LCON). The LF (LeFt) field controls the extending of LCON to 64 bits. If LF is a ze- 
ro, then LCON is right-justified within 64 bits and sign-extended. However, if LF is a one, 
then LCON is left adjusted within 64 bits and the low 28 bits are set to zero. An instruction 
using this format may also be conditionally executed by using the I and FLG fields, as in the 
register format. 
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Figure 3-4. Long Constant Instruction Format 


The PC-relative branch format (Figure 3-5), whose FC field has a binary value of 
11, is a 32-bit format that is allowed only in the I1 (low-order) half of an instruction word. 
Therefore, there must be another 32-bit format instruction in the IO (high-order) half of the 
same instruction word. If a PC-relative branch format instruction were put in the I0 half of 
an instruction word it would be interpreted as a long constant format instruction since both 
these formats use the same FC field. They are distinguished by where they occur in a 64-bit 
instruction word: the long constant format, a 64-bit instruction, must be 8-byte aligned and 
thus its FC field is always in the I0 half of an instruction word. 


FC PC-Relative Branch Offset DC |I}| FLG 
ila (23) (3) (3) 


Figure 3-5. PC-Relative Branch Instruction Format 


—W 
Ow 
won 
i ae) 
no 
oO 
Wo 
NO 
oo 


The remaining bits in the PC-relative branch format are a 23-bit signed PC-Rela- 
tive Branch Offset, a 3-bit delayed execution control field DC, and the I and FLG fields. 
The PC-Relative Branch Offset is a signed offset from the address of the I0 half of the in- 
struction word containing the PC-relative branch instruction, expressed as a number of 4- 
byte units. The branch address is calculated by summing the PC-Relative Branch Offset 
and the address of the first byte of the 64-bit instruction word containing the PC-relative 
branch. 


The I and FLG fields in the PC-relative branch format are used to control branching: 
if the flag is set and I is zero, or if the flag is clear and I is one, then the branch is taken. 
(Note that this is identical to the way conditional execution is controlled in other formats.) 
Execution of the instructions following the branch is controlled by the DC field (Table 3-2). 


Three variations of the long constant format are used for branches: if the opcode of an 
IO instruction is jump or call with an address expression operand (as opposed to a register 
operand), then the instruction is decoded in absolute branch format rather than long con- 
stant format; if the opcode is jump or call with a register operand or is ickill, then the instruc- 
tion is decoded in register branch format rather than long constant format; if the opcode is 
exts, then the instruction is decoded in exts format. In each of these cases the FC field must 
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have the binary value 11. Note that jump (call) with an address expression operand uses a 
different opcode from jump (call) with a register operand; if this was not the case, these two 
formats could not be distinguished. As noted in Table 3-2, the DC field also controls the PC 
stored for a call instruction, which is always an offset from the address of the call instruction. 


Table 3-2. Delayed Execution Control Decoding 


Stored PC Branch 
if call: not taken 
Address of call + and 


Assembler 
Mnemonic 


o) 
o) 


0 
1 
2 
3 
4 
5 
6 
¢ 


15' = execute from first delay word 


20d — execute from second delay word 


a = always 
b =if branch 
f =if fall-through 


The absolute branch format (Figure 3-6) has the same FC field as the long con- 
stant format, but is distinguished by the opcode being jump or call with an address expres- 
sion operand. The RC field specifies the register in which to store the PC if the opcode is 
call. The instruction is conditionally executed based on the I and FLG fields. If the instruc- 
tion is executed, then the branch address is the 41-bit Absolute Branch Address field of 
the instruction shifted left by two bit positions [3-1]. The delayed execution control bits op- 
erate as in the PC-relative branch format. (See Table 3-2). 


5 44 00 000 0 
3 76.432 0 


Absolute Branch Address 


(41) 


Figure 3-6. Absolute Branch Instruction Format 
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The register branch format (Figure 3-7) has the same FC field as the long constant 
format, and is used when the opcode is jump or call with a register operand, or is ickill. The 
RC field specifies the register in which to store the PC if the opcode is call. The instruction is 
conditionally executed based on the I and FLG fields. If the instruction is executed, then the 
branch address is taken from the register specified by the RA field. The delayed execution 
control bits operate as in the PC-relative branch format. (See Table 3-2). 


00 000 0 
76 432 0 


Unused DC jI] FLG 
(8) (6) (35) (3) 


Figure 3-7. Register Branch Instruction Format 


Wo 
NO 
—_- OO 
RN 
WN 
oo 
sf 


The exts instruction format (Figure 3-8) has the same FC field as the long constant 
format, and is used when the opcode is exts. The instruction is conditionally executed based 
on the I and FLG fields. If the instruction is executed, then the branch address is taken from 
the register specified by the RA field. The delayed execution control bits operate as in the 
PC-relative branch format, but using a delay code that could disable either of the following 
two instructions will produce unpredictable results. The exts Load Address field, when mul- 
tiplied by eight, gives an absolute virtual memory address whose contents are to be loaded 
into the register specified by the RC field. Bits 7 and 8 are unused in this format and must be 
zero. (Note that the exts Load Address is transformed to make it unique among all the 
CPUs in a multiprocessor; see the exts instruction description for details.) 


5 44 44 0 
3 9 


00 
87 
exts Load Address 
(33) 


Figure 3-8. exts Instruction Format 


Any attempt to execute an instruction with opcode jump, call, ickill, or exts as an I1 
instruction, or as an IO instruction with a format code other than 11, will result in an illegal in- 
struction/privilege violation trap. 
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Floating-point operations can be performed on 32-bit (single) or 64-bit (double) pre- 
cision numbers in IEEE format (ANSI/IEEE 754-1985). The precision is specified directly 
by the opcode, while the rounding mode, and ability to trap are controlled by bits in the Pro- 
cessor Status register. (Note that for convert instructions, the rounding mode can be speci- 
fied either by the srcb operand, or by the Processor Status register, as explained in section 
2 of this chapter). There are three classes of floating-point instructions: comparisons, con- 
versions and computations. The computations are carried out in the floating-point add, float- 
ing-point multiply and floating-point divide/square root units, the comparisons are done in 
the integer unit, and the conversions are performed in the floating-point add unit. Exact de- 
scriptions of how the conversions and computations are carried out (and under what condi- 
tions different types of exceptions may result) can be found in Appendix I. 


Many of these operations can result in floating-point exception conditions. If an ex- 
ception occurs, then the corresponding bit in the Arithmetic Exception Flags field in the Pro- 
cessor Status register will be set. These flags are “sticky” in that they may be set but may 
never be cleared by a floating-point instruction. The only way to clear an exception flag is 
with a wps instruction, which can set the flags to any value. Note, however, that setting a 
bit in the Arithmetic Exception Flags field with wps will not cause the corresponding trap, 
even if enabled. The only way to cause an arithmetic trap is to execute an instruction that 
causes the desired exception while traps are enabled for that exception. 


4.1 Floating-Point Compare Instructions 


Floating-point numbers of the same precision may be compared using the floating- 
point compare instructions. The complete set of IEEE comparisons may be made with the 
understanding that the order of the operands may be reversed, and that conditional branches 
and conditional execution can test the negation of the flags. For example, if the programmer 
wishes to determine if the double precision floating-point number in register rl is negative 
and set flag f3 accordingly, he may write: 


cmpgt.d 0.0, %r1, %f3 


since no cmplt.d instruction is available. The only exception to this is the uge (unordered or 
greater than or equal) comparison which requires two comparison instructions. 


In accordance with the IEEE standard, if one or both of the operands being compared 
is a NaN, then the quantities are considered to be unordered. For any two quantities, there- 
fore, exactly one of the four relationships: greater than, less than, equal, or unordered will 
be true. Invalid operation traps can be caused by floating-point comparisons if they are en- 
abled by the appropriate bit in the Arithmetic Trap Enables field in the Processor Status 
register (see Table 2-6) and if one of two other conditions occurs: 
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(1) one or both of the operands is a signaling NaN, or 
(2) the operands are unordered and the comparison test considers this to be 
an exception condition. 
The complete list of K-1 floating-point comparison operations is given in Table 4-1, in 
which the mnemonics for the operations have the following meanings: 
gt greater than 
ge greater than or equal 


Ig less than or greater than 
leg __ less than, equal, or greater than 
eq equal 


un unordered (i.e., at least one operand is a NaN) 
ueq __ unordered or equal 
ugt unordered or greater than 


Table 4-1. Floating-Point Comparisons 


Greater Less Exception if 


F 
F 
f 
7 
F 
F 
F 
F 


Noninnianys 
Sete on St 
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Instructions: cmp({gt,ge,lg,leg,eq,un,ueq,ugt}.d  srca,srcbfdst 
Opcodes: cmpgt.d 31 

cmpge.d 37 

cmplg.d 33 


cmpleg.d 35 
cmpeq.d 3B 
cmpun.d 3D 
cmpueq.d 3F 
cmpugt.d 39 


Operation: Compare 64-bit precision floating-point numbers and set flag. 
Operands used: srca, srcb 

Results stored: fdst 

Legal in: User or Supervisor mode 

Functional unit: Integer 

Exceptions: Invalid operation 

Description: 


The two 64-bit precision floating-point numbers, srca and srcb, are compared and the 
result of the comparison is recorded in the flag fdst. srca is the left comparand and srcb is the 
right comparand. See Table 4-1 for the definitions of the individual tests. If one of the oper- 
ands is a signaling NaN, or if one of the operands is a quiet NaN and Table 4-1 indicates an 
exception if unordered, then an invalid operation exception will occur and the corresponding 
exception flag will be set. If enabled by the corresponding Arithmetic Trap Enable in the 
Processor Status register, this exception will cause a trap. 
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Instructions: 


Opcodes: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Floating-Point Compare Instructions 4.1 


cmp({gt,ge,lg,leg,eq,un,ueq,ugt}.s srca,srcb,fdst 


cmpgt.s 30 
cmpge.s 36 
cmplg.s ae 


cmpleg.s 34 

cmpeq.s 3A 

cmpun.s 3C 

cmpueq.s 3E 

cmpugt.s 38 

Compare 32-bit precision floating-point numbers and set flag. 
srca, srcb 

fast 

User or Supervisor mode 


Integer 


Invalid operation 


The two 32-bit precision floating-point numbers, srca and srcb, are compared and the 
result of the comparison is recorded in the flag fdst. srca is the left comparand and srcb is the 
right comparand. See Table 4-1 for the definitions of the individual tests. If one of the oper- 
ands is a signaling NaN, or if one of the operands is a quiet NaN and Table 4-1 indicates an 
exception if unordered, then an invalid operation exception will occur and the corresponding 
exception flag will be set. If enabled by the corresponding Arithmetic Trap Enable in the 
Processor Status register, this exception will cause a trap. 


K-1 Architecture Manual CONFIDENTIAL Amdahl Key Computer Labs 


4.2 Floating-Point Conversion Instructions 4-5 


4.2 Floating-Point Conversion Instructions 


Numbers may be converted between 64-bit and 32-bit floating-point formats and be- 
tween either of the floating-point precisions and 64-bit integer format. Conversion may re- 
quire rounding and may result in floating-point overflow, underflow, invalid operation, or inex- 
act exceptions (and possibly traps, if enabled). Conversion of a quiet NaN from one floating- 
point format to another is possible without causing an exception. 


In order to provide a means for library routines to use a fixed rounding mode (instead 
of the Rounding Mode<1..0> field of the Processor Status register which could have been 
set arbitrarily by an application program), all but one of the floating-point conversion instruc- 
tions take a srcb argument (shown in Figure 4-1) which can directly specify the rounding 
mode. The RMB bit, if set, causes the rounding mode to come from the RM field of the srcb 
argument, instead of the Rounding Mode<1..0> field of the Processor Status register. If 
the RMB bit is not set, the RM field is ignored and the rounding mode in the Processor Sta- 
tus register is used. Note that the cvts.d (convert from single precision to double precision) 
instruction will never have to round, and thus does not require a srcb argument. 

6 000 
) sal 


Unused 


(61) 


Figure 4-1. Floating-Point Conversion srcb Argument 


In the following instruction descriptions, the phrase “according to the rounding mode 
specified in srcb” should be understood to mean either the rounding mode specified in the 
RM field of srcb (if the RMB bit is set), or the rounding mode specified in the Processor 
Status register (if the RMB bit is not set). 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Floating-Point Conversion Instructions 4.2 


cvtd.s srca,srcb,rdst 

94 

Convert a 64-bit precision floating-point number to 32-bit precision. 
srca, srcb 

rdst 

User or Supervisor mode 

Floating-Point Add 


Invalid operation, floating-point overflow, floating-point underflow, in- 
exact 


The 64-bit floating-point operand, srca, is rounded to 32-bit format according to the 
rounding mode specified in srcb, and then stored in rdst. A quiet NaN is converted without 
generating any exceptions. An invalid operation exception results if srca is a signaling NaN. 
If the exponent of the resulting number cannot fit in the 32-bit format’s exponent field, then a 
floating-point overflow or underflow exception occurs, as appropriate. If there was any loss 
of precision, then an inexact exception occurs. If an exception happens, the corresponding ex- 
ception flag will be set. If enabled by the corresponding Arithmetic Trap Enables in the 
Processor Status register, any of these exceptions will cause a trap. 
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Instruction: cvts.d srca,rdst 

Opcode: 84 

Operation: Convert a 32-bit precision floating-point number to 64-bit precision. 
Operands used: srca 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Add 

Exceptions: Invalid operation 

Description: 


The 32-bit floating-point operand, srca, is converted to 64-bit format and stored in 
rdst. No rounding is necessary, nor can there be any overflow or underflow. A quiet NaN is 
converted without generating any exceptions. An invalid operation exception results if srca 
is a signaling NaN. If an exception happens, the corresponding exception flag will be set. If 
enabled by the corresponding Arithmetic Trap Enable in the Processor Status register, 
this exception will cause a trap. 
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Instruction: cvtd.l srca,srcb,rdst 

Opcode: 95 

Operation: Convert a 64-bit precision floating-point number to a 64-bit integer. 
Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Add 

Exceptions: Invalid operation, inexact 

Description: 


The 64-bit floating-point number, srca, is rounded to an integer value according to the 
rounding mode specified in srcb, and the result is converted to integer format and stored in 
rdst. If the result for a finite input cannot be represented as a signed 64-bit integer, or if srca 
is positive or negative infinity, then an invalid operation exception is signaled and the largest 
integer of the same sign as srca is stored. An invalid operation exception also results if srca 
is a signaling or quiet NaN, and a zero is stored. If there was any loss of precision, then an 
inexact exception occurs. If an exception happens, the corresponding exception flag will be 
set. If enabled by the corresponding Arithmetic Trap Enables in the Processor Status 
register, any of these exceptions will cause a trap. 
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Instruction: cvts.] srca,srcb,rdst 

Opcode: 85 

Operation: Convert a 32-bit precision floating-point number to a 64-bit integer. 
Operands used: srca 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Add 

Exceptions: Invalid operation, inexact 

Description: 


The 32-bit floating-point number, srca, is rounded to an integer value according to the 
rounding mode specified in srcb, and the result is converted to integer format and stored in 
rdst. If the result for a finite input cannot be represented as a signed 64-bit integer, or if srca 
is positive or negative infinity, then an invalid operation exception is signaled and the largest 
integer of the same sign as srca is stored. An invalid operation exception also results if srca 
is a signaling or quiet NaN, and a zero is stored. If there was any loss of precision, then an 
inexact exception occurs. If an exception happens, the corresponding exception flag will be 
set. If enabled by the corresponding Arithmetic Trap Enables in the Processor Status 
register, any of these exceptions will cause a trap. 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Floating-Point Conversion Instructions 4.2 


evtl.d srca,srcb,rdst 
97 


Convert a signed 64-bit integer to a 64-bit precision floating-point 
number. 


srca, srcb 

rdst 

User or Supervisor mode 
Floating-Point Add 


Inexact 


The signed 64-bit integer, srca, is rounded to a 64-bit floating-point number according 
to the rounding mode specified in srcb. The result is stored in rdst. If there was any loss of 
precision, then an inexact exception occurs. If an exception happens, the corresponding ex- 
ception flag will be set. If enabled by the corresponding Arithmetic Trap Enable in the Pro- 
cessor Status register, this exception will cause a trap. 


K-1 Architecture Manual CONFIDENTIAL Amdahl Key Computer Labs 


4.2 


Instruction: 
Opcode: 


Operation: 


Operands used: 
Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Floating-Point Conversion Instructions 4-11 


cevtul.d srca,srcb,rdst 
98 


Convert an unsigned 64-bit integer to a 64-bit precision floating-point 
number. 


srca, srcb 

rdst 

User or Supervisor mode 
Floating-Point Add 


Inexact 


The unsigned 64-bit integer, srca, is rounded to a 64-bit floating-point number accord- 
ing to the rounding mode specified in srcb. The result is stored in rdst. If there was any loss 
of precision, then an inexact exception occurs. If an exception happens, the corresponding ex- 
ception flag will be set. If enabled by the corresponding Arithmetic Trap Enable in the Pro- 
cessor Status register, this exception will cause a trap. 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Floating-Point Conversion Instructions 4.2 


cvtl.s srca,srcb,rdst 
96 


Convert a signed 64-bit integer to a 32-bit precision floating-point 
number. 


srca, srcb 

rdst 

User or Supervisor mode 
Floating-Point Add 


Inexact 


The signed 64-bit integer, srca, is rounded to a 32-bit floating-point number according 
to the rounding mode specified in srcb. The result is stored in rdst. If there was any loss of 
precision, then an inexact exception occurs. If an exception happens, the corresponding ex- 
ception flag will be set. If enabled by the corresponding Arithmetic Trap Enable in the Pro- 
cessor Status register, this exception will cause a trap. 
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4.3 Floating-Point Computation Instructions 


The floating-point computation instructions perform addition, subtraction, negation, 
multiplication, division, and square root. One of three floating-point functional units is in- 
volved in each calculation. All operations except negation generate rounded results accord- 
ing to the rounding mode in the Processor Status register, and, depending upon the opera- 
tion and the input operands, can cause invalid operation, floating-point overflow, floating- 
point underflow, floating-point division by zero, and inexact exceptions. Negation is consid- 
ered to be a data moving operation and therefore never causes any exceptions. 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Floating-Point Computation Instrucitons 4.3 


neg.d srca,rdst 

92 

Compute the negative of a 64-bit precision floating-point number. 
srca 

rdst 

User or Supervisor mode 

Floating-Point Add 


none 


The negative of the 64-bit floating-point operand, srca, is computed and stored in 
rdst. The operation is performed by complementing the sign bit of srca regardless of the val- 
ue it represents. No exception can be generated by negation. 


K-1 Architecture Manual CONFIDENTIAL Amdahl Key Computer Labs 


4.3 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Floating-Point Computation Instrucitons 4-15 


neg.s srca,rdst 

82 

Compute the negative of a 32-bit precision floating-point number. 
srca 

rdst 

User or Supervisor mode 

Floating-Point Add 


none 


The negative of the 32-bit floating-point operand, srca, is computed and stored in 
rdst. The operation is performed by complementing the sign bit of srca regardless of the val- 
ue it represents. No exception can be generated by negation. 
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Instruction: add.d srca,srcb,rdst 

Opcode: 90 : 

Operation: Add two 64-bit precision floating-point numbers. 

Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Add 

Exceptions: Invalid operation, floating-point overflow, floating-point underflow, in- 
exact 

Description: 


The rounded sum of the two 64-bit floating-point operands, srca and srcb, is computed 
and stored in rdst. Rounding is performed according to the rounding mode specified in the 
Processor Status register. Floating-point overflow or floating-point underflow exceptions 
occur if the magnitude of a finite result is too big or too small to be represented in the 64-bit 
floating-point format. If traps are disabled, the result will be infinity or zero, respectively. If 
traps are enabled, the result will be as described in Appendix I. An invalid operation excep- 
tion will occur if either or both of the operands is a signaling NaN, or if one of the operands is 
positive infinity while the other operand is negative infinity. The result in either case will be 
a quiet NaN. If there was any loss of precision, then an inexact exception occurs. If an ex- 
ception happens, the corresponding exception flag will be set. If enabled by the correspond- 
ing Arithmetic Trap Enables in the Processor Status register, any of these exceptions will 
cause a trap. If none of the inputs is a signaling NaN, and one or more of the inputs is a quiet 
NaN, then there will be no exception and the result will be a quiet NaN. 
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Instruction: add.s srca,srcb,rdst 

Opcode: 80 

Operation: Add two 32-bit precision floating-point numbers. 

Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Add 

Exceptions: Invalid operation, floating-point overflow, floating-point underflow, in- 
exact 

Description: 


The rounded sum of the two 32-bit floating-point operands, srca and srcb, is computed 
and stored in rdst. Rounding is performed according to the rounding mode specified in the 
Processor Status register. Floating-point overflow or floating-point underflow exceptions 
occur if the magnitude of a finite result is too big or too small to be represented in the 32-bit 
floating-point format. If traps are disabled, the result will be infinity or zero, respectively. If 
traps are enabled, the result will be as described in Appendix I. An invalid operation excep- 
tion will occur if either or both of the operands is a signaling NaN, or if one of the operands is 
positive infinity while the other operand is negative infinity. The result in either case will be 
a quiet NaN. If there was any loss of precision, then an inexact exception occurs. If an ex- 
ception happens, the corresponding exception flag will be set. If enabled by the correspond- 
ing Arithmetic Trap Enables in the Processor Status register, any of these exceptions will 
cause a trap. If none of the inputs is a signaling NaN, and one or more of the inputs is a quiet 
NaN, then there will be no exception and the result will be a quiet NaN. 
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Instruction: sub.d srca,srcb,rdst 

Opcode: 91 

Operation: Subtract one 64-bit precision floating-point number from another. 

Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Add 

Exceptions: Invalid operation, floating-point overflow, floating-point underflow, in- 
exact 

Description: 


The rounded difference, srcb - srca, of the two 64-bit floating-point operands, srca and 
srcb, is computed and stored in rdst. Note that the order of the operands is backwards from 
what might be expected. Rounding is performed according to the rounding mode specified in 
the Processor Status register. Floating-point overflow or floating-point underflow excep- 
tions occur if the magnitude of a finite result is too big or too small to be represented in the 
64-bit floating-point format. If traps are disabled, the result will be infinity or zero, respec- 
tively. If traps are enabled, the result will be as described in Appendix I. An invalid opera- 
tion exception will occur if either or both of the operands is a signaling NaN, or if both of the 
operands are infinities with the same sign. The result in either case will be a quiet NaN. If 
there was any loss of precision, then an inexact exception occurs. If an exception happens, 
the corresponding exception flag will be set. If enabled by the Arithmetic Trap Enables in 
the Processor Status register, any of these exceptions will cause a trap. If none of the in- 
puts is a signaling NaN, and one or more of the inputs is a quiet NaN, then there will be no 
exception and the result will be a quiet NaN. 
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Instruction: sub.s srca,srcb,rdst 

Opcode: 81 

Operation: Subtract one 32-bit precision floating-point number from another. 

Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Add 

Exceptions: Invalid operation, floating-point overflow, floating-point underflow, in- 
exact 

Description: 


The rounded difference, srcb - srca, of the two 32-bit floating-point operands, srca and 
srcb, is computed and stored in rdst. Note that the order of the operands is backwards from 
what might be expected. Rounding is performed according to the rounding mode specified in 
the Processor Status register. Floating-point overflow or floating-point underflow excep- 
tions occur if the magnitude of a finite result is too big or too small to be represented in the 
32-bit floating-point format. If traps are disabled, the result will be infinity or zero, respec- 
tively. If traps are enabled, the result will be as described in Appendix I. An invalid opera- 
tion exception will occur if either or both of the operands is a signaling NaN, or if both of the 
operands are infinities with the same sign. The result in either case will be a quiet NaN. If 
there was any loss of precision, then an inexact exception occurs. If an exception happens, 
the corresponding exception flag will be set. If enabled by the Arithmetic Trap Enables in 
the Processor Status register, any of these exceptions will cause a trap. If none of the in- 
puts is a signaling NaN, and one or more of the inputs is a quiet NaN, then there will be no 
exception and the result will be a quiet NaN. 
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Instruction: mult.d srca,srcb,rdst 

Opcode: Al 

Operation: Multiply two 64-bit precision floating-point numbers. 

Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Multiply 

Exceptions: Invalid operation, floating-point overflow, floating-point underflow, in- 
exact 

Description: 


The rounded product of the two 64-bit floating-point operands, srca and srcb, is com- 
puted and stored in rdst. Rounding is performed according to the rounding mode specified in 
the Processor Status register. Floating-point overflow or floating-point underflow excep- 
tions occur if the magnitude of a finite result is too big or too small to be represented in the 
64-bit floating-point format. If traps are disabled, the result will be infinity or zero, respec- 
tively. If traps are enabled, the result will be as described in Appendix I. An invalid opera- 
tion exception will occur if either or both of the operands is a signaling NaN, or if one of the 
operands is infinity while the other operand is zero. The result in either case will be a quiet 
NaN. If there was any loss of precision, then an inexact exception occurs. If an exception 
happens, the corresponding exception flag will be set. If enabled by the Arithmetic Trap En- 
ables in the Processor Status register, any of these exceptions will cause a trap. If none of 
the inputs is a signaling NaN, and one or more of the inputs is a quiet NaN, then there will 
be no exception and the result will be a quiet NaN. 
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Instruction: mult.s srca,srcb,rdst 

Opcode: AO 

Operation: Multiply two 32-bit precision floating-point numbers. 

Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Multiply 

Exceptions: Invalid operation, floating-point overflow, floating-point underflow, in- 
exact 

Description: 


The rounded product of the two 32-bit floating-point operands, srca and srcb, is com- 
puted and stored in rdst. Rounding is performed according to the rounding mode specified in 
the Processor Status register. Floating-point overflow or floating-point underflow excep- 
tions occur if the magnitude of a finite result is too big or too small to be represented in the 
32-bit floating-point format. If traps are disabled, the result will be infinity or zero, respec- 
tively. If traps are enabled, the result will be as described in Appendix I. An invalid opera- 
tion exception will occur if either or both of the operands is a signaling NaN, or if one of the 
operands is infinity while the other operand is zero. The result in either case will be a quiet 
NaN. If there was any loss of precision, then an inexact exception occurs. If an exception 
happens, the corresponding exception flag will be set. If enabled by the Arithmetic Trap En- 
ables in the Processor Status register, any of these exceptions will cause a trap. If none of 
the inputs is a signaling NaN, and one or more of the inputs is a quiet NaN, then there will 
be no exception and the result will be a quiet NaN. 
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Instruction: div.d srca,srcb,rdst 

Opcode: A3 

Operation: Divide one 64-bit precision floating-point number by another. 

Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Divide/Square Root 

Exceptions: Invalid operation, division by zero, floating-point overflow, floating- 


point underflow, inexact 


Description: 


The rounded quotient, srca + srcb, of the two 64-bit floating-point operands, srca and 
srcb, is computed and stored in rdst. Rounding is performed according to the rounding mode 
specified in the Processor Status register. Floating-point overflow or floating-point under- 
flow exceptions occur if the magnitude of a finite result is too big or too small to be represent- 
ed in the 64-bit floating-point format. If traps are disabled, the result will be infinity or zero, 
respectively. If traps are enabled, the result will be as described in Appendix I. An invalid 
operation exception will occur if either or both of the operands is a signaling NaN, or if both of 
the operands are zero, or if both of the operands are infinity. The result in any case will be a 
quiet NaN. If the numerator is not zero, NaN, or infinity while the denominator is zero, then a 
division by zero exception will occur and the correctly signed infinity will be produced as a re- 
sult. If there was any loss of precision, then an inexact exception occurs. If an exception hap- 
pens, the corresponding exception flag will be set. If enabled by the Arithmetic Trap En- 
ables in the Processor Status register, any of these exceptions will cause a trap. If none of 
the inputs is a signaling NaN, and one or more of the inputs is a quiet NaN, then there will 
be no exception and the result will be a quiet NaN. 
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Instruction: div.s srca,srcb,rdst 

Opcode: A2 

Operation: Divide one 32-bit precision floating-point number by another. 

Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Divide/Square Root 

Exceptions: Invalid operation, division by zero, floating-point overflow, floating- 


point underflow, inexact 


Description: 


The rounded quotient, srca + srcb, of the two 32-bit floating-point operands, srca and 
srcb, is computed and stored in rdst. Rounding is performed according to the rounding mode 
specified in the Processor Status register. Floating-point overflow or floating-point under- 
flow exceptions occur if the magnitude of a finite result is too big or too small to be represent- 
ed in the 32-bit floating-point format. If traps are disabled, the result will be infinity or zero, 
respectively. If traps are enabled, the result will be as described in Appendix I. An invalid 
operation exception will occur if either or both of the operands is a signaling NaN, or if both of 
the operands are zero, or if both of the operands are infinity. The result in any case will be a 
quiet NaN. If the numerator is not zero, NaN, or infinity while the denominator is zero, then a 
division by zero exception will occur and the correctly signed infinity will be produced as a re- 
sult. If there was any loss of precision, then an inexact exception occurs. If an exception hap- 
pens, the corresponding exception flag will be set. If enabled by the Arithmetic Trap En- 
ables in the Processor Status register, any of these exceptions will cause a trap. If none of 
the inputs is a signaling NaN, and one or more of the inputs is a quiet NaN, then there will 
be no exception and the result will be a quiet NaN. 
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Instruction: sqrt.d srca,rdst 

Opcode: B3 

Operation: Compute the square root of a 64-bit precision floating-point number. 
Operands used: srca 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Divide/Square Root 

Exceptions: Invalid operation, inexact 

Description: 


The square root of the 64-bit operand, srca, is computed and stored in rdst. Rounding 
is performed according to the rounding mode specified in the Processor Status register. If 
the input operand is finite and greater than or equal to zero, the result will be finite and non- 
negative. If the input operand is negative zero or positive infinity, the result will be negative 
zero Or positive infinity, respectively. An invalid operation exception will occur if srca is ei- 
ther a signaling NaN, or a negative operand other than negative zero (including negative in- 
finity). The result in either case will be a quiet NaN. If there was any loss of precision, then 
an inexact exception occurs. If an exception happens, the corresponding exception flag will 
be set. If enabled by the Arithmetic Trap Enables in the Processor Status register, either 
of these exceptions will cause a trap. If the input is a quiet NaN, then there will be no excep- 
tion and the result will be a quiet NaN. 
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Instruction: sqrt.s srca,rdst 

Opcode: B2 

Operation: Compute the square root of a 32-bit precision floating-point number. 
Operands used: srca 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-Point Divide/Square Root 

Exceptions: Invalid operation, inexact 

Description: 


The square root of the 32-bit operand, srca, is computed and stored in rdst. Rounding 
is performed according to the rounding mode specified in the Processor Status register. If 
the input operand is finite and greater than or equal to zero, the result will be finite and non- 
negative. If the input operand is negative zero or positive infinity, the result will be negative 
zero Or positive infinity, respectively. An invalid operation exception will occur if srca is ei- 
ther a signaling NaN, or a negative operand other than negative zero (including negative in- 
finity). The result in either case will be a quiet NaN. If there was any loss of precision, then 
an inexact exception occurs. If an exception happens, the corresponding exception flag will 
be set. If enabled by the Arithmetic Trap Enables in the Processor Status register, either 
of these exceptions will cause a trap. If the input is a quiet NaN, then there will be no excep- 
tion and the result will be a quiet NaN. 
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CHAPTER 5. Load and Store Instructions 


This chapter describes load and store instructions, which are used to move data be- 
tween the registers and main memory. It also describes instructions that read, write and 
test the ELF flags, instructions that are used for memory diagnostics, and instructions that 
write Data Watchpoint Table entries. 


5.1 Referencing Memory 


The operands of load, eload and store instructions are used to specify source or des- 
tination register addresses and to calculate memory addresses. A memory address may be 
specified directly by an operand, or it may be calculated by multiplying the index operand by 
1, 2, 4, or 8 as specified by the opcode, and adding the result to the base operand (Figure 2- 
5). (The index multiplier, implicit in the opcode, is referred to as m in the instruction descrip- 
tions.) Memory addresses are architecturally limited to 48 bits, and may be restricted fur- 
ther by implementations of the architecture [5-1]. The opcode also implicitly specifies a 
memory precision (1, 2, 4, or 8 bytes). If the memory address is not a multiple of the preci- 
sion, an illegal address condition will occur. 


After the address is calculated, it may be modified if either of the Byte Order Low-to- 
High (Table 2-8) or Small Address Compatibility Mode bits in the Processor Status reg- 
ister is set. (See Processor Status in Chapter 2). In brief, Byte Order Low-to-High will 
transform the low three bits of the address according to Table 2-8, and Small Address Com- 
patibility Mode will clear any bits in the address above bit 31. The loadcpu, storecpu, and 
exts instructions perform an additional translation to make the address unique among all of 
the CPUs in a multiprocessor [5-6]. The address resulting from these three possible trans- 
formations is a virtual address and must be mapped into a physical address before accessing 
any data. This translation process also produces a set of protection bits controlling read and 
write access, and a bit that indicates if the data is shared. (See the section on Data Map- 
ping in Chapter 2). The virtual-to-physical address mapping function depends on whether 
the reference is a user or a supervisor mode reference. 


The User Mode Load and User Mode Store bits control whether memory referenc- 
es are treated as user mode references or supervisor mode references. This topic is dis- 
cussed in the section on Data Mapping in Chapter 2, and is illustrated in Table 2-4. The Us- 
er Protection bit controls whether user mode references use the protections provided in the 
data page table, or whether they get supervisor “over-ride” privileges. This topic is also 
discussed in the section on Data Mapping in Chapter 2. 


Supervisor mode references use the identity mapping to produce a physical address; 
this mapping also produces a default set of protection bits that grant both read and write ac- 
cess and indicate that the page is shared. For a user mode reference, the translation process 
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reads the data page table at an address that is a function of the given virtual address and the 
Process Key<12..0> field of the Processor Status register [5-2]. If the data page table 
entry that is read is not the entry corresponding to the given address, a data map miss trap 
will occur. (The data page table is a cache and cannot simultaneously map all possible pag- 
es). If the entry is found, then it contains the physical address, the access bits and the 
shared bit. Finally, for a user mode reference, if the User Protection bit in the Processor 
Status register is off, then the read and write access permission bits are both forced on if the 
page is valid (has either read or write access). 


The above mapping process is performed for all memory reference instructions and 
can result in a data map miss for user mode references. The access bits that are produced 
are used to check whether the reference to be performed is legal. References that read mem- 
ory (load, eload, pel, etc.) require read access; references that write memory (store, dflush, 
etc.) require write access; swat instructions, which both read and write memory, require both 
read and write access. If the reference is not allowed and the instruction is not an eload, an 
illegal access trap will occur. For eload instructions, no illegal access trap is generated; this 
condition is signaled by storing zero in rdst and setting ELF[RC]. 


If no problems occur during the computation and mapping of the memory address, then 
the requested load/store reference will take place. There are signed and unsigned versions 
of all load (and eload) instructions for precisions less than 8 bytes. The signed version of a 
load will sign-extend the data read to 64 bits; the unsigned version will zero-extend the da- 
ta to 64 bits. For 64-bit data, the assembler accepts both signed and unsigned instruction 
mnemonics (though both refer to the same opcode); load.I and loadu.l, for example, both per- 
form a load of 64 bits of data into a register. 


Constants may be used in place of registers as operands (see Chapter 3), except that 
data to be stored in memory may not be a constant. To store a constant value in a memory lo- 
cation, the constant must first be loaded into a register, as in the following sequence: 


move <constant>, %rl 
store. %r1,<memory address> 


The function map(addr), which is referred to extensively in the instruction descrip- 
tions later in this chapter, performs all the transformations necessary to turn its virtual ad- 
dress argument into a physical address that can be used to reference memory. This includes 
the effects of the Byte Order Low-to-High and Small Address Compatibility Mode bits in 
the Processor Status register, the special transformations performed by the loadcpu, 
storecpu, and exts instructions, and virtual-to-physical address mapping. Note that the 
map function will not return a result larger than the size of a physical memory address [5-1]. 
Memory referencing instructions can trap during the mapping process (for traps such as ille- 
gal address, data map miss, and illegal access), or after referencing memory (for traps such 
as Nonexistent Memory and ECC errors). 


The memory access function mem(size, paddr) refers to size consecutive bytes start- 
ing at physical address paddr. That is, paddr is a result of the map(addr) function. 
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5.2 Memory-Referencing Instruction Traps 


A number of trapping conditions are common to major classes of memory-referencing 
instructions. Table 5-1 shows the traps that can occur for load-type, eload-type, store- 
type, dflush, and zcl instructions. Some special load/store instructions, such as dflush, exts 
and slstrpd are described in other chapters of this manual but refer to Table 5-1 to enumer- 
ate the conditions under which they trap. The individual instruction descriptions that follow 
refer to these trap classes instead of listing all the traps individually. Explanations of the cir- 
cumstances under which these traps can occur also appear below and not with the individual 
instruction descriptions. See Appendix D for more information on load/store traps. 


Table 5-1. Memory Referencing Instruction Traps 


Data Map Miss 
Illegal Address 
Illegal Access 
Data Watchpoint 


PM/CT Parity 

ECC (1st sub-line) 

ECC (subsequent sub-line) 
Memory-related Parity 
Nonexistent Memory 


! for user mode references only 
2 


3 


if the Early Load Alignment Trap bit in the Processor Status register is set, and ... 
if the precision is 2, 4 or 8 bytes 

+ if sufficient low-order address bits are not zero [5-5] 

5 Yes, except in Trap State which disables Data Watchpoint traps 


A data map miss trap occurs when a user mode reference is not found in the data 
page table. A complete description of this issue is given in Chapter 2 in the section on Data 
Mapping. 

An illegal address trap occurs when the address of the quantity being referenced is 
not a multiple of its precision. For example, a 4-byte load must use a 4-byte aligned ad- 
dress (the low two bits of the address must be zero). Note that the zcl instruction referenc- 
es an entire data cache line and thus requires more low-order address bits to be zero than 
load instructions (which reference eight bytes or less) [5-5]. 
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An illegal access trap occurs when a user mode reference does not have the proper 
permissions in the data page table for the requested reference. (I.e., a load must have read 
permission). This topic is also discussed in Chapter 2 in the section on Data Mapping. 


A data watchpoint trap occurs when the referenced address matches an address in 
one of several Data Watchpoint Table entries in the load/store unit. The Data Watchpoint 
Table entries (or just Data Watchpoints) can be set with the wdwp instruction. See the wd- 
wp instruction description for details on setting Data Watchpoints and the conditions under 
which they will cause a trap (or stop the system clocks). 


A page map/cache tag parity error can occur when reading the data page table 
(also called the page map) for a user mode reference, or when reading the tag fields of the 
data cache (which are read for all memory references). There are both recoverable (occurring 
with a data cache miss) and non-recoverable (occurring with a data cache hit) versions of 
this error. 


A cache miss will transfer an entire cache line of data from the memory system into 
the cache, but this transfer actually comprises a series of smaller transfers of sub-lines (the 
largest unit that can be read from or written to the cache at one time). The K-1 distinguishes 
two types of ECC (Error Correcting Code) errors: ECC errors on the first sub-line trans- 
ferred, and ECC errors on subsequent sub-lines. In order to decrease latency, the memory 
system always returns the sub-line with the referenced data first. Thus, for load and eload 
instructions (including Idecc, Idnecc, and loadcpu) the distinction between first and subse- 
quent lines indicates whether the data written to the register file is correct or not. When da- 
ta with ECC errors is returned from the memory system, the cache tags for that line are in- 
validated (so that the incorrect data cannot be accessed). Note that a store instruction can 
get ECC errors since a store that misses in the data cache must retrieve the referenced 
cache line from the memory system. store instructions, however, don’t care which type of 
ECC error they receive — the data can not be marked as valid in the data cache so the store 
can not complete. Correctable and uncorrectable memory errors are logged as described in 
Appendices E and G, 


A memory-related parity error is caused when the processor fails to complete a 
memory operation because of an internal error in the memory subsystem. This error is usual- 
ly indicative of serious problems in the memory system. 


A nonexistent memory error occurs when a reference outside the bounds of physical 
memory is made. For a user mode reference, this can only happen if the data page table has 
been set up incorrectly. 


For all traps except the ECC error on subsequent sub-lines trap and the non-recover- 
able version of the page map/cache tag parity error trap, all load (including Idecc, Idnecc, 
and loadcpu) and eload instructions that trap will not modify their destination register. 
eload instructions that detect an error but do not trap (as for an illegal access) will always 
write their destination register to zero. The cases in which this can happen are: illegal ac- 
cess errors (which can only happen with user mode references), illegal address errors (but 
only if the Early Load Alignment Trap bit in the Processor Status register is clear), and 
nonexistent memory errors for supervisor mode references. These three cases will be re- 
ferred to as the “special eload error conditions” in the instruction descriptions. The section 
on Early Load in Chapter 2 gives a complete description of the trap behavior of eload in- 
structions. 
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5.3 Load Instructions 


The following instruction descriptions are for all the “normal” load and eload instruc- 
tions. Several special types of load instructions are covered in a later section. 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 53 


loadu.b (srca)[srcb:1],rdst 

D1 

Load one byte of data from memory into a register. 
srca, srcb, mem(1, map(srca + srcb)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


load-type (see Table 5-1) 


A memory address is calculated as map(srca + srcb). If no trap occurs during the 
mapping process, then a single byte at the resultant physical address will be read from mem- 
ory, zero-extended to 64 bits, and stored in rdst. ELF[RC] will be set to one if any trap oc- 
curs, and to zero if no trap occurs. 
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Instruction: loadu.b srca,rdst 

Opcode: DO 

Operation: Load one byte of data from memory into a register. 
Operands used: srca, mem(1, map(srca)) 

Results stored: rdst, ELF[RC] 

Legal in: User or Supervisor mode 

Functional unit: Load/Store 

Exceptions: load-type (see Table 5-1) 

Description: 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then a single byte at the resultant physical address will be read from memory, zero- 
extended to 64 bits, and stored in rdst. ELF[RC] will be set to one if any trap occurs, and to 
zero if no trap occurs. 
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Instructions: 


Opcodes: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5.3 
loadu.h (srca)[srcb:m],rdst 
loadu.h (srca)[srcb:1],rdst D5 
loadu.h (srca)[srcb:2],rdst D6 


Load two bytes of data from memory into a register. 
srca, srcb, mem(2, map(srca + m*srcb)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


load-type (see Table 5-1) 


A memory address is calculated as map(srca + m*srcb). If no trap occurs during the 
mapping process, then two bytes at the resultant physical address will be read from memory, 
zero-extended to 64 bits, and stored in rdst. ELF[RC] will be set to one if any trap occurs, 
and to zero if no trap occurs. 
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Instruction: loadu.h srca,rdst 

Opcode: D4 

Operation: Load two bytes of data from memory into a register. 
Operands used: srca, mem(2, map(srca)) 

Results stored: rdst, ELF[RC] 

Legal in: User or Supervisor mode 

Functional unit: Load/Store 

Exceptions: load-type (see Table 5-1) 

Description: 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then two bytes at the resultant physical address will be read from memory, zero-ex- 
tended to 64 bits, and stored in rdst. ELF[RC] will be set to one if any trap occurs, and to 
zero if no trap occurs. 
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Instructions: 


Opcodes: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions be ys 


loadu.w (srca)[srcb:m],rdst 


loadu.w (srca)[srcb:1],rdst D9 
loadu.w (srca)[srcb:4],rdst DA 


Load four bytes of data from memory into a register. 
srca, srcb, mem(4, map(srca + m*srcb)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


load-type (see Table 5-1) 


A memory address is calculated as map(srca + m*srcb). If no trap occurs during the 
mapping process, then four bytes at the resultant physical address will be read from memory, 
zero-extended to 64 bits, and stored in rdst. ELF[RC] will be set to one if any trap occurs, 
and to zero if no trap occurs. 
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Instruction: loadu.w srca,rdst 

Opcode: D8 

Operation: Load four bytes of data from memory into a register. 
Operands used: srca, mem(4, map(srca)) 

Results stored: rdst, ELF[RC] 

Legal in: User or Supervisor mode 

Functional unit: Load/Store 

Exceptions: load-type (see Table 5-1) 

Description: 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then four bytes at the resultant physical address will be read from memory, zero-ex- 
tended to 64 bits, and stored in rdst. ELF[RC] will be set to one if any trap occurs, and to 
zero if no trap occurs. 
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Instructions: 


Opcodes: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5.3 


load[u].1 (srca)[srcb:m],rdst 


load[u].l (srca)[srcb:1],rdst CD 
load[u].1 (srca)[srcb:8],rdst CE 


Load eight bytes of data from memory into a register. 
srca, srcb, mem(8, map(srca + m*srcb)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


load-type (see Table 5-1) 


A memory address is calculated as map(srca + m*srcb). If no trap occurs during the 
mapping process, then eight bytes at the resultant physical address will be read from memo- 
ry and stored in rdst. ELF[RC] will be set to one if any trap occurs, and to zero if no trap oc- 


curs. 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5-13 


load[u].1 srca,rdst 

ce 

Load eight bytes of data from memory into a register. 
srca, mem(8, map(srca)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


load-type (see Table 5-1) 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then eight bytes at the resultant physical address will be read from memory and 
stored in rdst. ELF[RC] will be set to one if any trap occurs, and to zero if no trap occurs. 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5:3 


load.b (srca)[srcb:1],rdst 

Cl 

Load one byte of data from memory into a register, extending the sign. 
srca, srcb, mem(1, map(srca + srcb)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


load-type (see Table 5-1) 


A memory address is calculated as map(srca + srcb). If no trap occurs during the 
mapping process, then a single byte at the resultant physical address will be read from mem- 
ory, sign-extended to 64 bits, and stored in rdst. ELF[RC] will be set to one if any trap oc- 
curs, and to zero if no trap occurs. 
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Instruction: load.b srca,rdst 

Opcode: CO 

Operation: Load one byte of data from memory into a register, extending the sign. 
Operands used: srca, mem(1, map(srca)) 

Results stored: rdst, ELF[RC] 

Legal in: User or Supervisor mode 

Functional unit: Load/Store 

Exceptions: load-type (see Table 5-1) 

Description: 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then a single byte at the resultant physical address will be read from memory, sign- 
extended to 64 bits, and stored in rdst. ELF[RC] will be set to one if any trap occurs, and to 
zero if no trap occurs. 
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Instructions: 


Opcodes: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5.3 
load.h (srca)[srcb:m],rdst 
load.h (srca)[srcb:1],rdst Ss 
load.h (srca)[srcb:2],rdst C6 


Load two bytes of data from memory into a register, extending the sign. 
srca, srcb, mem(2, map(srca + m*srcb)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


load-type (see Table 5-1) 


A memory address is calculated as map(srca + m*srcb). If no trap occurs during the 
mapping process, then two bytes at the resultant physical address will be read from memory, 
sign-extended to 64 bits, and stored in rdst. ELF[RC] will be set to one if any trap occurs, 
and to zero if no trap occurs. 
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Instruction: load.h srca,rdst 

Opcode: C4 

Operation: Load two bytes of data from memory into a register, extending the sign. 
Operands used: srca, mem(2, map(srca)) 

Results stored: rdst, ELF[RC] 

Legal in: User or Supervisor mode 

Functional unit: Load/Store 

Exceptions: load-type (see Table 5-1) 

Description: 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then two bytes at the resultant physical address will be read from memory, sign-ex- 
tended to 64 bits, and stored in rdst. ELF[RC] will be set to one if any trap occurs, and to 
zero if no trap occurs. 
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Instructions: 


Opcodes: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions Be 
load.w (srca)[srcb:m],rdst 
load.w (srca)[srcb:1],rdst C9 
load.w (srca)[srcb:4],rdst CA 


Load four bytes of data from memory into a register, extending the sign. 
srca, srcb, mem(4, map(srca + m*srcb)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


load-type (see Table 5-1) 


A memory address is calculated as map(srca + m*srcb). If no trap occurs during the 
mapping process, then four bytes at the resultant physical address will be read from memory, 
sign-extended to 64 bits, and stored in rdst. ELF[RC] will be set to one if any trap occurs, 
and to zero if no trap occurs. 
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Instruction: load.w srca,rdst 

Opcode: C8 

Operation: Load four bytes of data from memory into a register, extending the sign. 
Operands used: srca, mem(4, map(srca)) 

Results stored: rdst, ELF[RC] 

Legal in: User or Supervisor mode 

Functional unit: Load/Store 

Exceptions: load-type (see Table 5-1) 

Description: 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then four bytes at the resultant physical address will be read from memory, sign-ex- 
tended to 64 bits, and stored in rdst. ELF[RC] will be set to one if any trap occurs, and to 
zero if no trap occurs. 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 53 


eloadu.b (srca)[srcb:1],rdst 

Fl 

Load one byte of data from memory into a register. 
srca, srcb, mem(1, map(srca + srcb)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


eload-type (see Table 5-1) 


A memory address is calculated as map(srca + srcb). If no error occurs during the 
mapping process, then a single byte at the resultant physical address will be read from mem- 
ory, zero-extended to 64 bits, and stored in rdst. If one of the special eload error conditions 
occurs, then no trap will be taken and zero will be stored in rdst. ELF[RC] will be set to one 
if any error or trap occurs, and to zero if no error or trap occurs. 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5-21 


eloadu.b srca,rdst 

FO 

Load one byte of data from memory into a register. 
srca, mem(1, map(srca)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


eload-type (see Table 5-1) 


A memory address is calculated as map(srca). If no error occurs during the mapping 
process, then a single byte at the resultant physical address will be read from memory, zero- 
extended to 64 bits, and stored in rdst. If one of the special eload error conditions occurs, 
then no trap will be taken and zero will be stored in rdst. ELF[RC] will be set to one if any 
error or trap occurs, and to zero if no error or trap occurs. 
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Instructions: eloadu.h (srca)[srcb:m],rdst 
Opcodes: eloadu.h (srca)[srcb:1],rdst F5 
eloadu.h (srca)[srcb:2],rdst F6 
Operation: Load two bytes of data from memory into a register. 
Operands used: srca, srcb, mem(2, map(srca + m*srcb)) 
Results stored: rdst, ELF[RC] 
Legal in: User or Supervisor mode 
Functional unit: Load/Store 
Exceptions: eload-type (see Table 5-1) 
Description: 


A memory address is calculated as map(srca + m*srcb). If no error occurs during the 
mapping process, then two bytes at the resultant physical address will be read from memory, 
zero-extended to 64 bits, and stored in rdst. If one of the special eload error conditions oc- 
curs, then no trap will be taken and zero will be stored in rdst. ELF[RC] will be set to one if 
any error or trap occurs, and to zero if no error or trap occurs. 
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5.3 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5-23 


eloadu.h srca,rdst 

F4 

Load two bytes of data from memory into a register. 
srca, mem(2, map(srca)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


eload-type (see Table 5-1) 


A memory address is calculated as map(srca). If no error occurs during the mapping 
process, then two bytes at the resultant physical address will be read from memory, zero-ex- 
tended to 64 bits, and stored in rdst. If one of the special eload error conditions occurs, then 
no trap will be taken and zero will be stored in rdst. ELF[RC] will be set to one if any error 
or trap occurs, and to zero if no error or trap occurs. 
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5-24 Load Instructions 53 


Instructions: eloadu.w (srca)[srcb:m],rdst 
Opcodes: eloadu.w (srca)[srcb:1],rdst F9 
eloadu.w (srca)[srcb:4],rdst FA 
Operation: Load four bytes of data from memory into a register. 
Operands used: srca, srcb, mem(4, map(srca + m*srcb)) 
Results stored: rdst, ELF[RC] 
Legal in: User or Supervisor mode 
Functional unit: Load/Store 
Exceptions: eload-type (see Table 5-1) 
Description: 


A memory address is calculated as map(srca + m*srcb). If no error occurs during the 
mapping process, then four bytes at the resultant physical address will be read from memory, 
zero-extended to 64 bits, and stored in rdst. If one of the special eload error conditions oc- 
curs, then no trap will be taken and zero will be stored in rdst. ELF[RC] will be set to one if 
any error or trap occurs, and to zero if no error or trap occurs. 


K-1 Architecture Manual CONFIDENTIAL Amdahl Key Computer Labs 


3.3 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5-25 


eloadu.w srca,rdst 

F8 

Load four bytes of data from memory into a register. 
srca, mem(4, map(srca)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


eload-type (see Table 5-1) 


A memory address is calculated as map(srca). If no error occurs during the mapping 
process, then four bytes at the resultant physical address will be read from memory, zero-ex- 
tended to 64 bits, and stored in rdst. If one of the special eload error conditions occurs, then 
no trap will be taken and zero will be stored in rdst. ELF[RC] will be set to one if any error 
or trap occurs, and to zero if no error or trap occurs. 
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5-26 


Instructions:: 


Opcodes: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5.3 


eload[u].1 (srca)[srcb:m],rdst 


eload[u].l (srca)[srcb:1],rdst ED 
eload[ul].1 (srca)[srcb:8],rdst EE 


Load eight bytes of data from memory into a register. 
srca, srcb, mem(8, map(srca + m*srcb)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


eload-type (see Table 5-1) 


A memory address is calculated as map(srca + m*srcb). If no error occurs during the 
mapping process, then eight bytes at the resultant physical address will be read from memo- 
ry and stored in rdst. If one of the special eload error conditions occurs, then no trap will be 
taken and zero will be stored in rdst. ELF[RC] will be set to one if any error or trap occurs, 
and to zero if no error or trap occurs. 
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ae 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5-27 


eload[u].1 srca,rdst 

EC 

Load eight bytes of data from memory into a register. 
srca, mem(8, map(srca)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


eload-type (see Table 5-1) 


A memory address is calculated as map(srca). If no error occurs during the mapping 
process, then eight bytes at the resultant physical address will be read from memory and 
stored in rdst. If one of the special eload error conditions occurs, then no trap will be taken 
and zero will be stored in rdst. ELF[RC] will be set to one if any error or trap occurs, and to 
zero if no error or trap occurs. 
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5-28 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5.3 


eload.b (srca)[srcb:1],rdst 

El 

Load one byte of data from memory into a register, extending the sign. 
srca, srcb, mem(1, map(srca + srcb)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


eload-type (see Table 5-1) 


A memory address is calculated as map(srca + srcb). If no error occurs during the 
mapping process, then a single byte at the resultant physical address will be read from mem- 
ory, sign-extended to 64 bits, and stored in rdst. If one of the special eload error conditions 
occurs, then no trap will be taken and zero will be stored in rdst. ELF[RC] will be set to one 
if any error or trap occurs, and to zero if no error or trap occurs. 
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ao 


Instruction: 


Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5-29 


eload.b srca,rdst 

EO 

Load one byte of data from memory into a register, extending the sign. 
srca, mem(1, map(srca)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


eload-type (see Table 5-1) 


A memory address is calculated as map(srca). If no error occurs during the mapping 
process, then a single byte at the resultant physical address will be read from memory, sign- 
extended to 64 bits, and stored in rdst. If one of the special eload error conditions occurs, 
then no trap will be taken and zero will be stored in rdst. ELF[RC] will be set to one if any 
error or trap occurs, and to zero if no error or trap occurs. 
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5-30 


Instructions: 


Opcodes: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions Dan 
eload.h (srca)[srcb:m],rdst 
eload.h (srca)[srcb:1],rdst ES 
eload.h (srca)[srcb:2],rdst E6 


Load two bytes of data from memory into a register, extending the sign. 
srca, srcb, mem(2, map(srca + m*srcb)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


eload-type (see Table 5-1) 


A memory address is calculated as map(srca + m*srcb). If no error occurs during the 
mapping process, then two bytes at the resultant physical address will be read from memory, 
sign-extended to 64 bits, and stored in rdst. If one of the special eload error conditions oc- 
curs, then no trap will be taken and zero will be stored in rdst. ELF[RC] will be set to one if 
any error or trap occurs, and to zero if no error or trap occurs. 
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3.3 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5-31 


eload.h srca,rdst 

E4 

Load two bytes of data from memory into a register, extending the sign. 
srca, mem(2, map(srca)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


eload-type (see Table 5-1) 


A memory address is calculated as map(srca). If no error occurs during the mapping 
process, then two bytes at the resultant physical address will be read from memory, sign-ex- 
tended to 64 bits, and stored in rdst. If one of the special eload error conditions occurs, then 
no trap will be taken and zero will be stored in rdst. ELF[RC] will be set to one if any error 
or trap occurs, and to zero if no error or trap occurs. 
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5-32 


Instructions: 


Opcodes: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5.3 
eload.w (srca)[srcb:m],rdst 
eload.w (srca)[srcb:1],rdst E9 
eload.w (srca)[srcb:4],rdst EA 


Load four bytes of data from memory into a register, extending the sign. 
srca, srcb, mem(4, map(srca + m*srcb)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


eload-type (see Table 5-1) 


A memory address is calculated as map(srca + m*srcb). If no error occurs during the 
mapping process, then four bytes at the resultant physical address will be read from memory, 
sign-extended to 64 bits, and stored in rdst. If one of the special eload error conditions oc- 
curs, then no trap will be taken and zero will be stored in rdst. ELF[RC] will be set to one if 
any error or trap occurs, and to zero if no error or trap occurs. 
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33 


Instruction: 
Opcode: 
Operation: 
Operands used: 
Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Load Instructions 5-33 


eload.w srca,rdst 

E8 

Load four bytes of data from memory into a register, extending the sign. 
srca, mem(4, map(srca)) 

rdst, ELF[RC] 

User or Supervisor mode 

Load/Store 


eload-type (see Table 5-1) 


A memory address is calculated as map(srca). If no error occurs during the mapping 
process, then four bytes at the resultant physical address will be read from memory, sign-ex- 
tended to 64 bits, and stored in rdst. If one of the special eload error conditions occurs, then 
no trap will be taken and zero will be stored in rdst. ELF[RC] will be set to one if any error 
or trap occurs, and to zero if no error or trap occurs. 
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5.4 Store Instructions 


The following instruction descriptions are for all the “normal” store instructions. 
Several special types of store instructions are covered in a later section. 
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5.4 Store Instructions 5-35 


Instruction: store.b srcec,(srca)[srcb:1] 

Opcode: D3 

Operation: Store one byte of data from a register into memory. 
Operands used: srcc, srca, srcb 

Results stored: mem(1, map(srca + srcb)) 

Legal in: User or Supervisor mode 

Functional unit: Load/Store 

Exceptions: store-type (see Table 5-1) 

Description: 


A memory address is calculated as map(srca + srcb). If no trap occurs during the 
mapping process, then a single byte from the least significant part of srcc will be written to 
memory at the resultant physical address. 
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5-36 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Store Instructions 5.4 


store.b Srcc,Srca 

C3 

Store one byte of data from a register into memory. 
SCC, srca 

mem(1, map(srca)) 

User or Supervisor mode 

Load/Store 


store-type (see Table 5-1) 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then a single byte from the least significant part of srcc will be written to memory at 
the resultant physical address. 


K-1 Architecture Manual CONFIDENTIAL Amdahl Key Computer Labs 


5.4 Store Instructions 5-37 


Instructions: store.h srcc,(srca)[srcb:m] 
Opcodes: store.h srec,(srca)[srcb:1] D7 
store.h srcc,(srca)[srcb:2] E7 
Operation: Store two bytes of data from a register into memory. 
Operands used: srcc, srca, srcb 
Results stored: mem(2, srca + map(m*srcb)) 
Legal in: User or Supervisor mode 
Functional unit: Load/Store 
Exceptions: store-type (see Table 5-1) 
Description: 


A memory address is calculated as map(srca + m*srcb). If no trap occurs during the 
mapping process, then two bytes from the least significant part of srcc will be written to 
memory at the resultant physical address. 


Amdahl Key Computer Labs CONFIDENTIAL K-1 Architecture Manual 


5-38 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Store Instructions 5.4 


store.h SYCC,SYCa 

uy 

Store two bytes of data from a register into memory. 
srcc, srca 

mem(2, map(srca)) 

User or Supervisor mode 

Load/Store 


store-type (see Table 5-1) 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then two bytes from the least significant part of srcc will be written to memory at 
the resultant physical address. 
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5.4 


Instructions: 


Opcodes: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Store Instructions 5-39 
store.w srcc,(srca)[srcb:m] 
store.w srcc,(srca)[srcb:1] DB 
store.w srcc,(srca)[srcb:4] EB 


Store four bytes of data from a register into memory. 
srcc, srca, srcb 

mem(4, srca + m*srcb) 

User or Supervisor mode 

Load/Store 


store-type (see Table 5-1) 


A memory address is calculated as map(srca + m*srcb). If no trap occurs during the 
mapping process, then four bytes from the least significant part of srcc will be written to 
memory at the resultant physical address. 
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5-40 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Store Instructions 5.4 


store.w SrcC,Srca 

CB 

Store four bytes of data from a register into memory. 
Srcc, srca 

mem(4, map(srca)) 

User or Supervisor mode 

Load/Store 


store-type (see Table 5-1) 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then four bytes from the least significant part of srcc will be written to memory at 
the resultant physical address. 
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5.4 


Instructions: 


Opcodes: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Store Instructions 5-41 
store. srcec,(srca)[srcb:m] 
store.] srcc,(srca)[srcb:1]__ DF 
store.l srcc,(srca)[srcb:8] EF 


Store eight bytes of data from a register into memory. 
srcc, srca, srcb 

mem(8, map(srca + m*srcb)) 

User or Supervisor mode 

Load/Store 


store-type (see Table 5-1) 


A memory address is calculated as map(srca + m*srcb). If no trap occurs during the 
mapping process, then all eight bytes of srcc will be written to memory at the resultant physi- 


cal address. 
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5-42 


Instruction: 
Opcode: 
Operation: 
Operands used: 
Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Store Instructions 5.4 


store. Srcc,Srca 

CF 

Store eight bytes of data from a register into memory. 
srcc, srca 

mem(8, map(srca)) 

User or Supervisor mode 

Load/Store 


store-type (see Table 5-1) 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then all eight bytes of srcc will be written to memory at the resultant physical ad- 


dress. 
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he Special Load/Store Instructions 5-43 


5.5 Special Load/Store Instructions 


The instructions described in this section are more specialized than the previous load 
and store instructions. 
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5-44 Special Load/Store Instructions bs 


Instruction: swat SrCa,STCC 

Opcode: FC 

Operation: Swap atomically eight bytes of data between memory and a register. 
Operands used: srca, srcc, mem(8, map(srca)) 

Results stored: rdst (same register as srcc), ELF[RC], mem(8, map(srca)) 

Legal in: User or Supervisor mode 

Functional unit: Load/Store 

Exceptions: load-type (see Table 5-1) 

Description: 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then eight bytes at the resultant physical address will be read from memory and ex- 
changed with srcc. ELF[RC] will be set to one if any trap occurs, and to zero if no trap oc- 
curs. 


swat is the only instruction that is guaranteed to atomically read and write a memory 
location. The addressed location should be on a shared page since the operation is not guar- 
anteed to be atomic between processors if the page is not shared. swat is also the only 
load/store instruction that requires both read and write access, and that requires both the Us- 
er Mode Load and User Mode Store bits of the Processor Status register to have the 
same value. 
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3.5 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Special Load/Store Instructions 5-45 


loadcpu srca,rdst 
8D 


Load eight bytes of data from memory into a register, transforming the 
address to be unique to one CPU in a multiprocessor. 


srca, mem(8, map(srca)) 
rdst, ELF[RC] 

User or Supervisor mode 
Load/Store 


load-type (see Table 5-1) 


A memory address is calculated as map(srca), which for this instruction includes an 
implementation-dependent transformation to make the address unique among all of the 
CPUs in a multiprocessor [5-6]. If no trap occurs during the mapping process, then eight 
bytes at the resultant physical address will be read from memory and stored in rdst. 
ELF[RC] will be set to one if any trap occurs, and to zero if no trap occurs. 
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5-46 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Special Load/Store Instructions = 


storecpu SYCC,STCA 
8F 


Store eight bytes of data from a register into memory, transforming the 
address to be unique to one CPU in a multiprocessor. 


Srcc, srca 

mem(8, map(srca)) 

User or Supervisor mode 
Load/Store 


store-type (see Table 5-1) 


A memory address is calculated as map(srca), which for this instruction includes an 
implementation-dependent transformation to make the address unique among all of the 
CPUs in a multiprocessor [5-6]. If no trap occurs during the mapping process, then all eight 
bytes of srcc will be written to memory at the resultant physical address. 
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3.3 Special Load/Store Instructions 5-47 


Instruction: Idecc srca,rdst 

Opcode: DC 

Operation: Load ECC bits from memory. 

Operands used: srca, mem(8, map(srca)) 

Results stored: rdst, ELF[RC] 

Legal in: User or Supervisor mode 

Functional unit: Load/Store 

Exceptions: load-type (see Table 5-1), but without ECC errors 
Description: 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then the cache will be consulted to see if it contains the addressed data. If it does, 
then the instruction completes as if it were a load.I (loading whatever data is in the data 
cache). If it does not (if there is a cache miss), then the ECC bits for the line containing the 
resultant physical address will be read from memory and stored in the cache. The instruction 
will then complete as if there were no cache miss. ELF[RC] will be set to one if any trap oc- 
curs, and to zero if no trap occurs. See Appendix E for details of how the ECC bits are 
stored and accessed. 


Note that using this instruction on shared data will produce unpredictable results (if 
some other processor has the data). For this reason, Idecc should be used with great care 
as a Supervisor mode reference (since supervisor mode references are always shared). 


For diagnostics to properly use this instruction, the ECC bits should be verified a line 
at a time, and the desired line should be flushed from the data cache before the first Idece in- 
struction for that line. 


Amdahl Key Computer Labs CONFIDENTIAL K-1 Architecture Manual 


5-48 Special Load/Store Instructions a5 


Instruction: Idnecc srca,rdst 

Opcode: BC 

Operation: Load uncorrected data from memory. 

Operands used: srca, mem(8, map(srca)) 

Results stored: rdst, ELF[RC] 

Legal in: User or Supervisor mode 

Functional unit: Load/Store 

Exceptions: load-type (see Table 5-1), but without ECC errors 
Description: 


A memory address is calculated as map(srca). If no trap occurs during the mapping 
process, then the cache will be consulted to see if it contains the addressed data. If it does, 
then the instruction completes as if it were a load.l. If it does not (if there is a cache miss), 
then the line containing the resultant physical address will be read from memory and stored 
in the cache. However, the data will be uncorrected, i.e., the error correction possible using 
the ECC bits in memory will not be in effect. The instruction will then complete as if there 
were no cache miss. ELF[RC] will be set to one if any trap occurs, and to zero if no trap oc- 
curs. 


Note that using this instruction on shared data will produce unpredictable results (if 
some other processor has the data). For this reason, Idnece should be used with great care 
as a supervisor mode reference (since supervisor mode references are always shared). 


For diagnostics to properly use this instruction, the ECC bits should be verified a line 
at a time, and the desired line should be flushed from the data cache before the first Idnecc 
instruction for that line. 
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5.6 ELF Flag Instructions 5-49 


5.6 ELF Flag Instructions 


These instructions are used to test ELF flags, trapping if one is set, and to read and 
write the ELF flags for context switching. 
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5-50 


Instruction: 
Opcode: 
Operation: 
Operands used: 
Results stored: 
Legal in: 
Functional unit: 
Exceptions: 


Description: 


ELF Flag Instructions 


echk rdst 

F3 

Test an ELF bit and trap if it is set. 
ELF[RC] 

none 

User or Supervisor mode 
Load/Store 


echk 


5.6 


If the ELF flag whose number is given by the instruction’s RC field is set, then an 


echk trap occurs. 
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5.6 ELF Flag Instructions 5-51 


Instruction: relf rdst 

Opcode: C2 

Operation: Read early load fault bits into a register 
Operands used: ELF<63..0> 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Load/Store 

Exceptions: none 

Description: 


The early load fault bits, ELF[RC]<63..0>, corresponding to registers r63 through r0, | 
are written to rdst. 
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5-52 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


ELF Flag Instructions 


welf srca,srcb 

D2 

Write early load fault bits. 
srca, srcb 

ELF <63..32> or ELF<31..0> 
User or Supervisor mode 
Load/Store 


none 


5.6 


If the low bit of srcb is a zero, then the early load fault bits, ELF[RC]<31..0>, corre- 
sponding to registers r31 through r0, are written from the low-order half of srca. If the low 
bit of srcb is a one, then ELF[RC]<63..32>, corresponding to registers r63 through r32, are 
written from the low-order half of srca. 
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ej Data Watchpoint 5-53 


5.7 Data Watchpoint 


A data watchpoint is a breakpoint on load/store reference addresses. Data watch- 
points are stored in the Data Watchpoint Table. A Data Watchpoint Table entry specifies 
a virtual address to be compared with addresses output by the Load/Store functional unit. 
(See Table 5-1, which lists the instruction classes that can take data watchpoint traps.) If 
enabled, an address match can cause a data watchpoint trap. The low three bits of the ad- 
dress are not compared, so care must be taken when setting up a data watchpoint for an 8-, 
16-, or 32-bit quantity: spurious data watchpoint traps could occur on accesses to other 
parts of the same 64-bit word. 


The format of a Data Watchpoint Table entry is shown in Figure 5-1. 


6 44 0000 
3 87 3210 


Unused Data Watchpoint Word Address 


(16) (45) 


Figure 5-1. Data Watchpoint Table Entry Format 


The Data Watchpoint Word Address field specifies bits <47..3> of the address to 
be compared [5-3]. The ENA bit must be set to enable this Data Watchpoint Table entry; 
if this bit is clear, this data watchpoint will not trap or cause the clocks to be stopped. The 
USR bit controls whether address matches are detected for user or for supervisor references. 
If the USR bit is set, then only user mode references will be watched; if the USR bit is clear, 
then only supervisor mode references will be watched. Data watchpoint traps will not occur 
in Trap State. The SO (store only) bit, if set, disables data watchpoint trapping except for 
store, swat, and zcl instructions. If the SO bit is clear, then both loads and stores can take 
data watchpoint traps. 


The Console can enable data watchpoints to stop the processor’s clocks and interrupt 
the Front-End Processor. See Appendix G for details of the Front-End Processor. Even 
though data watchpoint traps cannot be taken in Trap State, data watchpoint clocks stops 
can happen in Trap State. 


A few instructions are treated specially. The exts and slstrpd instructions will never 
cause data watchpoint traps, though they can cause clock stops. The dflush and pel instruc- 
tions will never cause data watchpoint traps or clock stops. The zcl instruction compares 
only the cache line address (ignoring more low-order bits than other compares). 


The size of the Data Watchpoint Table is implementation-dependent [5-4]. 
The contents of the Data Watchpoint Table cannot be read. 
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5-54 Data Watchpoint ar 


Instruction: wdwp srca,srcb 

Opcode: FB 

Operation: Write a Data Watchpoint Table entry. 
Operands used: srca, srcb 

Results stored: Data Watchpoint Table[srcb] 

Legal in: Supervisor mode only 

Functional unit: Load/Store 

Exceptions: Illegal instruction/privilege violation 
Description: 


If the processor is in user mode, then an illegal instruction/privilege violation trap will 
occur. If the processor is in supervisor mode then the Data Watchpoint Table entry ad- 
dressed by the low-order bits of srcb will be written from the srca operand. The format of a | 
Data Watchpoint Table entry is shown in Figure 5-1. 
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CHAPTER 6. Integer Instructions 


The integer instructions encompass a great variety of operations, including boolean, 
shift, compare, bit count and bit reverse, data moving, and signed and unsigned arithmetic op- 
erations. Operands are usually 64-bit integers, although certain integer operations work on 
8-, 16-, 32-, 33-, and 53-bit quantities. 


6.1 Integer Arithmetic Instructions 


Integer operations use two’s complement arithmetic. Whether the most significant 
bit is considered a sign bit is relevant only when comparisons are involved, or when the re- 
sult has more bits of significance than the operands, such as in multiply. In those cases, both 
signed and unsigned versions of the operation are provided. All operations are performed in 
the integer functional unit except for signed and unsigned integer multiply and divide (which 
are done in the floating-point multiply and floating-point divide/square root units), and 
move.d (which is done in the floating-point add unit). 


Certain common operations, such as ine (incrementing), dec (decrementing), and neg 
(negating), are not provided in hardware because they can be obtained using existing opera- 
tions with constant operands at no cost in time or space. (See Chapter 3.) The assembler, 
however, performs these translations automatically, making it appear that these operations 
exist. Refer to The K-] Assembly Language Reference Manual for more details. 
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6-2 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Arithmetic Instructions 


add srca,srcb,rdst 
5A 

Add two 64-bit integers. 
srca, srcb 

rdst 

User or Supervisor mode 
Integer 


none 


6.1 


The low 64 bits of the sum of two 64-bit integer operands, srca and srcb, are stored in 


rdst. 
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6.1 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Arithmetic Instructions 6-3 


addc f0,srca,srcb,rdst 

58 

Add two 64-bit integers with carry in and out. 
f0, srca, srcb 

rdst, f0 

User or Supervisor mode 

Integer 


none 


The sum, srca + srcb + 1 - f0, of two 64-bit integer operands, srca and srcb, and the 
complement of flag f0 is computed and the low 64 bits of the result are stored in rdst. In addi- 
tion, the complement of the carry out of the sum is returned to f0. This instruction is intended 
for use in multiple precision arithmetic. 
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6-4 Integer Arithmetic Instructions 6.1 


Instruction: addt srca,srcb,rdst 

Opcode: 59 

Operation: Add two 64-bit signed integers with trap on overflow. 
Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Integer 

Exceptions: Integer overflow 

Description: 


The low 64 bits of the sum of two 64-bit integer operands, srca and srcb, are stored in 
rdst. An integer overflow trap will occur if the result overflows. Note that such a trap will oc- 
cur regardless of the state of the Arithmetic Trap Enables in the Processor Status regis- 
ter. 


This instruction can be used to implement range checking for languages such as Pas- 
cal and Ada. 
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6.1 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Arithmetic Instructions 


sub srca,srcb,rdst 
4A 

Subtract two 64-bit integers. 
srca, srcb 

rdst 

User or Supervisor mode 
Integer 


none 


The difference, srcb - srca, of two 64-bit integer operands, srca and srcb, is computed 


and the low 64 bits of the result are stored in rdst. 


backwards from what might be expected. 
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6-6 Integer Arithmetic Instructions 6.1 


Instruction: subb f0,srca,srcb,rdst 

Opcode: 48 

Operation: Subtract two 64-bit integers with borrow in and out. 
Operands used: f0, srca, srcb 

Results stored: rdst, f0 

Legal in: User or Supervisor mode 

Functional unit: Integer 

Exceptions: none 

Description: 


The difference, srcb - srca - f0, of two 64-bit integer operands, srca and srcb, and a 
borrow from flag f0 is computed and the low 64 bits of the result are stored in rdst. In addi- 
tion, the borrow out of the sum is returned to f0. This instruction is intended for use in multi- 
ple precision arithmetic. It can also be used to maintain a decrementing loop counter, and 
will set flag f0 when the counter is decremented from a positive number or zero to a negative 
number. 
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6.1 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Arithmetic Instructions 6-7 


subt srca,srcb,rdst 

49 

Subtract two 64-bit signed integers with trap on overflow. 
srca, srcb 

rdst 

User or Supervisor mode 

Integer 


Integer overflow 


The difference, srcb - srca, of two 64-bit signed integer operands, srca and srcb, is 
computed and the low 64 bits of the result are stored in rdst. An integer overflow trap will 
occur if the result overflows. Note that such a trap will occur regardless of the state of the 
Arithmetic Trap Enables in the Processor Status register. 


This instruction can be used to implement range checking for languages such as Pas- 


cal and Ada. 
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6-8 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Arithmetic Instructions 6.1 


multlss srca,srcb,rdst 
A5 


Multiply two 53-bit signed integers and return the low part of the re- 
sult. 


srca, srcb 

rdst 

User or Supervisor mode 
Floating-point Multiply 


none 


The least significant 64 bits of the product of two 53-bit signed integer operands, srca 
and srcb, is computed and stored in rdst. The sign bit of the operands is assumed to be in bit 
position 52, and bits 63 through 53 are ignored. A 106-bit signed product is calculated and 
the low 64 bits are stored as the result. The multhss instruction can be used to obtain the 
high-order portion of the product. 
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6.1 


Instruction: 
Opcode: 


Operation: 


Operands used: 
Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Arithmetic Instructions 6-9 


multhss srca,srcb,rdst 
A7 


Multiply two 53-bit signed integers and return the high part of the re- i 
sult. 


srca, srcb 

rdst 

User or Supervisor mode 
Floating-point Multiply 


none 


The most significant 42 bits of the product of two 53-bit signed integer operands, srca 
and srcb, is computed and stored in rdst. The sign bit of the operands is assumed to be in bit 
position 52, and bits 63 through 53 are ignored. A 106-bit signed product is calculated and 
the high 42 bits are sign-extended to 64 bits and stored as the result. Note that the sign bit 
of the result gives the true sign of the product. The multlss instruction can be used to obtain 
the low-order portion of the product. 
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6-10 Integer Arithmetic Instructions 6.1 


Instruction: multluu srca,srcb,rdst 

Opcode: A9 

Operation: Multiply two 53-bit unsigned integers and return the low part of the re- 
sult. 

Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Floating-point Multiply 

Exceptions: none 

Description: 


The least significant 64 bits of the product of two 53-bit unsigned integer operands, sr- 
ca and srcb, is computed and stored in rdst. A 106-bit product is calculated and the low 64 
bits are stored as the result. The multhuu instruction can be used to obtain the high-order 
portion of the product. 
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6.1 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Arithmetic Instructions 6-11 


multhuu srca,srcb,rdst 


AB 


Multiply two 53-bit unsigned integers and return the high part of the re- 
sult. 


srca, srcb 

rdst 

User or Supervisor mode 
Floating-point Multiply 


none 


The most significant 42 bits of the product of two 53-bit unsigned integer operands, sr- 
ca and srcb, is computed and stored in rdst. A 106-bit unsigned product is calculated and the 
high 42 bits are zero-extended to 64 bits and stored as the result. The multluu instruction 
can be used to obtain the low-order portion of the product. 
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6-12 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Arithmetic Instructions 6.1 


multlus srca,srcb,rdst 
AD 


Multiply a 53-bit unsigned integer and a 53-bit signed integer and re- 
turn the low part of the result. 


srca, srcb 

rdst 

User or Supervisor mode 
Floating-point Multiply 


none 


The least significant 64 bits of the product of a 53-bit unsigned integer operand, srca, 
and a 53-bit signed integer operand, srcb, is computed and stored in rdst. The sign bit of srcb 
is assumed to be in bit position 52, and bits 63 through 53 are ignored. A 106-bit signed 
product is calculated and the low 64 bits are stored as the result. The multhus instruction 
can be used to obtain the high-order portion of the product. 


K-1 Architecture Manual CONFIDENTIAL Amdahl Key Computer Labs 


6.1 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Arithmetic Instructions 6-13 


multhus srca,srcb,rdst 
AF 


Multiply a 53-bit unsigned integer and a 53-bit signed integer and re- 
turn the high part of the result. 


srca, srcb 

rdst 

User or Supervisor mode 
Floating-point Multiply 


none 


The most significant 42 bits of the product of a 53-bit unsigned integer operand, srca, 
and a 53-bit signed integer operand, srcb, is computed and stored in rdst. The sign bit of srcb 
is assumed to be in bit position 52, and bits 63 through 53 are ignored. A 106-bit signed 
product is calculated and the high 42 bits are sign-extended to 64 bits and stored as the re- 
sult. Note that the sign bit of the result gives the true sign of the product. The multlus in- 
struction can be used to obtain the low-order portion of the product. 
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6-14 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Arithmetic Instructions 6.1 


divsst srca,srcb,rdst 
A4 


Divide a 33-bit signed integer by a 33-bit signed integer and return a 
64-bit signed integer quotient using truncate rounding mode. 


srca, srcb 

rdst 

User or Supervisor mode 
Floating-point Divide 


Integer divide 


The truncated 64-bit signed integer quotient, srca + srcb, of the two 33-bit signed in- 
teger operands, srca and srcb, is computed and stored in rdst. If srcb is zero (in its low 33 
bits), the result will be zero and, if enabled by the Integer Divide Trap Enable bit in the 
Processor Status register, an integer divide trap will occur. 
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6.1 Integer Arithmetic Instructions 6-15 


Instruction: divssr srca,srcb,rdst 
Opcode: A6 
Operation: Divide a 33-bit signed integer by a 33-bit signed integer and return a 


64-bit signed integer quotient. 


Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 
Functional unit: Floating-point Divide 
Exceptions: Integer divide 
Description: 


The rounded 64-bit signed integer quotient, srca + srcb, of the two 33-bit signed inte- 
ger operands, srca and srcb, is computed and stored in rdst. Rounding is performed according 
to the Rounding Mode<1..0> field of the Processor Status register. If srcb is zero (in its 
low 33 bits), the result will be zero and, if enabled by the Integer Divide Trap Enable bit in 
the Processor Status register, an integer divide trap will occur. 
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6-16 Integer Compare Instructions 6.2 


6.2 Integer Compare Instructions 


Only a very small number of compare instructions are necessary to provide a com- 
plete set of operations because the order of operands may be easily switched, and because 
the complements of flags may be tested as easily as the true version. (See Chapter 3). 
Therefore, there are only three types of integer compare instructions. One type tests for 
equal and may be used with either signed or unsigned operands. The other two types test 
for greater than, one with signed operands and the other with unsigned operands. 


Each type of comparison is available in four precisions for operating on 1-, 2-, 4- or 8- 
byte operands. For precisions less than 8 bytes, the least significant part of the 64-bit regis- 
ter operands contain the data to be compared. 
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6.2 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Compare Instructions 6-17 


cmpeq.b srca,srcb fdst 

27 

Compare two 8-bit integers for equality. 
srca, srcb 

Sdst 

User or Supervisor mode 

Integer 


none 


The two 8-bit operands, srca and srcb, are compared. If they are equal, then the flag 
fadst is set. If they are different, then the flag is cleared. 
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6-18 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Compare Instructions 6.2 


cmpeq.h srca,srcb fdst 

26 

Compare two 16-bit integers for equality. 
srca, srcb 

fdst 

User or Supervisor mode 

Integer 


none 


The two 16-bit operands, srca and srcb, are compared. If they are equal, then the flag 
fast is set. If they are different, then the flag is cleared. 
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6.2 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Compare Instructions 6-19 


cmpeq.w srca,srcb fdst 

25 

Compare two 32-bit integers for equality. 
srca, srcb 

fdst 

User or Supervisor mode 

Integer 


none 


The two 32-bit operands, srca and srcb, are compared. If they are equal, then the flag 
fadst is set. If they are different, then the flag is cleared. 
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6-20 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Compare Instructions 6.2 


cmpeq.l srca,srcb fdst ee 
24 

Compare two 64-bit integers for equality. 

srca, srcb 

fast 

User or Supervisor mode 

Integer 


none 


The two 64-bit operands, srca and srcb, are compared. If they are equal, then the flag 
fadst is set. If they are different, then the flag is cleared. 
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6.2 Integer Compare Instructions 6-21 


Instruction: cmpgt.b srca,srcb fdst 
Opcode: 23 

Operation: Compare two 8-bit signed integers. 
Operands used: srca, srcb 

Results stored: fast 

Legal in: User or Supervisor mode 
Functional unit: Integer 

Exceptions: none 

Description: 


The two signed 8-bit operands, srca and srcb, are compared. If srca is greater than 
srcb, then the flag fdst is set, otherwise the flag is cleared. 
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6-22 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Compare Instructions 


cmpgt.h srca,srcb fdst 

22 

Compare two 16-bit signed integers. 
srca, srcb 

Sdst 

User or Supervisor mode 

Integer 


none 


6.2 


The two signed 16-bit operands, srca and srcb, are compared. If srca is greater than 
srcb, then the flag fdst is set, otherwise the flag is cleared. 
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6.2 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Compare Instructions 


cmpgt.w srca,srcb fdst 

21 

Compare two 32-bit signed integers. 
srca, srcb 

fdst 

User or Supervisor mode 

Integer 


none 


6-23 


The two signed 32-bit operands, srca and srcb, are compared. If srca is greater than 
srcb, then the flag fdst is set, otherwise the flag is cleared. 
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6-24 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Compare Instructions 


cmpgt.l srca,srcb fdst 

20 

Compare two 64-bit signed integers. 
srca, srcb 

fdst 

User or Supervisor mode 

Integer 


none 


6.2 


The two signed 64-bit operands, srca and srcb, are compared. If srca is greater than 
srcb, then the flag fdst is set, otherwise the flag is cleared. 
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6.2 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Compare Instructions 


cmpugt.b srca,srcb fast 

2B 

Compare two 8-bit unsigned integers. 
srca, srcb 

fdst 

User or Supervisor mode 

Integer 


none 


6-25 


The two unsigned 8-bit operands, srca and srcb, are compared. If srca is greater than 
srcb, then the flag fdst is set, otherwise the flag is cleared. 
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6-26 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Compare Instructions 


cmpugt.h srca,srcb fdst 

2A 

Compare two 16-bit unsigned integers. 
srca, srcb 

fdst 

User or Supervisor mode 

Integer 


none 


6.2 


The two unsigned 16-bit operands, srca and srcb, are compared. If srca is greater 
than srcb, then the flag fdst is set, otherwise the flag is cleared. 
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6.2 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Compare Instructions 


cmpugt.w _srca,srcb,fdst 

29 

Compare two 32-bit unsigned integers. 
srca, srcb 

Sdst 

User or Supervisor mode 

Integer 


none 


6-27 


The two unsigned 32-bit operands, srca and srcb, are compared. If srca is greater 
than srcb, then the flag fdst is set, otherwise the flag is cleared. 
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6-28 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Integer Compare Instructions 6.2 


cmpugt.l srca,srcb fdst 

28 

Compare two 64-bit unsigned integers. 
srca, srcb 

fdst 

User or Supervisor mode 

Integer 


none 


The two unsigned 64-bit operands, srca and srcb, are compared. If srca is greater 
than srcb, then the flag fdst is set, otherwise the flag is cleared. 
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6.3 Data Moving Instructions 6-29 


6.3 Data Moving Instructions 


The data moving instructions move quantities of any data type. One class of these in- 
structions can move one of two operands based on the value of a flag. 
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6-30 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Data Moving Instructions 


move srca,rdst 
4B 

Move a 64-bit integer. 
srca 

rdst 

User or Supervisor mode 
Integer 


none 


The 64-bit integer, srca, is stored unchanged in rdst. 
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6.3 
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6.3 


Instruction: 
Opcode: 
Operation: 
Operands used: 
Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


The 64-bit integer, srca, is stored unchanged in rdst. 


Data Moving Instructions 


move.d srca,rdst 
93 

Move a 64-bit integer. 
srca 

rdst 

User or Supervisor mode 
Floating-point Add 


none 


6-31 


This is the same operation as 


move but is performed in the floating-point add functional unit, which can run in parallel with 
the integer unit, thus providing more bandwidth for copying data between registers. The tim- 
ing of this instruction is different than that of move. See Appendix C for details. 
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6-32 


Instruction: 
Opcode: 
Operation: 
Operands used: 
Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Data Moving Instructions 


zext.b srca,rdst 

2C 

Zero-extend an 8-bit integer to 64 bits. 
srca 

rdst 

User or Supervisor mode 

Integer 


none 


6.3 


The 8-bit integer, srca, is zero-extended to 64 bits and stored in rdst. 
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6.3 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Data Moving Instructions 


zext.h srca,rdst 

2D 

Zero-extend a 16-bit integer to 64 bits. 
srca 

rdst 

User or Supervisor mode 

Integer 


none 


6-33 


The 16-bit integer, srca, is zero-extended to 64 bits and stored in rdst. 
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6-34 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Data Moving Instructions 


zext.w srca,rdst 

2E 

Zero-extend a 32-bit integer to 64 bits. 
srca 

rdst 

User or Supervisor mode 

Integer 


none 


6.3 


The 32-bit integer, srca, is zero-extended to 64 bits and stored in rdst. 
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6.3 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Data Moving Instructions 


sext.b srca,rdst 

4C 

Sign-extend an 8-bit integer to 64 bits. 
srca 

rdst 

User or Supervisor mode 

Integer 


none 


6-35 


The 8-bit integer, srca, is sign-extended to 64 bits and stored in rdst. 
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6-36 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Data Moving Instructions 


sext.h srca,rdst 

4D 

Sign-extend a 16-bit integer to 64 bits. 
srca 

rdst 

User or Supervisor mode 

Integer 


none 


6.3 


The 16-bit integer, srca, is sign-extended to 64 bits and stored in rdst. 
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6.3 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Data Moving Instructions 


sext.w srca,rdst 

4E 

Sign-extend a 32-bit integer to 64 bits. 
srca 

rdst 

User or Supervisor mode 

Integer 


none 


6-37 


The 32-bit integer, srca, is sign-extended to 64 bits and stored in rdst. 
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6-38 Data Moving Instructions 6.3 


Instructions: sel fsrc,srca,srcb,rdst 

Opcodes: sel f0,srca,srcb,rdst 40 
sel fl,srca,srcb,rdst 41 
sel f2,srca,srcb,rdst 42 
sel f3,srca,srcb,rdst 43 
sel f4,srca,srcb,rdst 44 
sel f5,srca,srcb,rdst 45 
sel f6,srca,srcb,rdst 46 

Operation: Move one of two 64-bit integers. 

Operands used: fsrc, srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Integer 

Exceptions: none 

Description: 


One of the 64-bit integers, srca or srcb, is stored unchanged in rdst. If fsrc is a zero, 
then srca is moved, otherwise srcb is moved. 
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6.4 Boolean Instructions 6-39 


6.4 Boolean Instructions 


The boolean instructions perform and, or, and xor operations on 64-bit operands. 
Certain operations, such as not and the or of the second operand with the complement of the 
first operand, are not provided in hardware because they can be obtained using existing oper- 
ations with constant operands or with the operands reversed at no loss in time or space. 
(See Chapter 3.) The assembler, however, performs these translations automatically, mak- 
ing it appear that all operations exist. Refer to The K-1] Assembly Language Reference Manu- 
al for more details. 
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6-40 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Boolean Instructions 6.4 


and srca,srcb,rdst 

63 

Perform the bitwise and of two 64-bit integers. 
srca, srcb 

rdst 

User or Supervisor mode 

Integer 


none 


The bitwise and of two 64-bit integer operands, srca and srcb, is stored in rdst. 


K-1 Architecture Manual CONFIDENTIAL Amdahl Key Computer Labs 


6.4 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Boolean Instructions 6-41 


andtc srca,srcb,rdst 
69 


Perform the bitwise and of two 64-bit integers, complementing one 
first. 


srca, srcb 

rdst 

User or Supervisor mode 
Integer 


none 


The bitwise and of the 64-bit integer operand, srca, and the complement of the 64-bit 
integer operand, srcb, is stored in rdst. 
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6-42 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Boolean Instructions 6.4 


andcc srca,srcb,rdst 

65 

Perform the bitwise and of the complements of two 64-bit integers. 
srca, srcb 

rdst 

User or Supervisor mode 

Integer 


none 


The bitwise and of the complement of the 64-bit integer operand, srca, and the com- 
plement of the 64-bit integer operand, srcb, is stored in rdst. 
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6.4 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Boolean Instructions 6-43 


or srca,srcb,rdst 

6A 

Perform the bitwise or of two 64-bit integers. 
srca, srcb 

rdst 

User or Supervisor mode 

Integer 


none 


The bitwise or of two 64-bit integer operands, srca and srcb, is stored in rdst. 
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6-44 


Instruction: 
Opcode: 
Operation: 
Operands used: 
Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Boolean Instructions 6.4 


ortc srca,srcb,rdst 

6F 

Perform the bitwise or of two 64-bit integers, complementing one first. 
srca, srcb 

rdst 

User or Supervisor mode 

Integer 


none 


The bitwise or of the 64-bit integer operand, srca, and the complement of the 64-bit 
integer operand, srcb, is stored in rdst. 


K-1 Architecture Manual CONFIDENTIAL Amdahl Key Computer Labs 


6.4 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Boolean Instructions 6-45 


orce srca,srcb,rdst 

6C 

Perform the bitwise or of the complements of two 64-bit integers. 
srca, srcb 

rdst 

User or Supervisor mode 

Integer 


none 


The bitwise or of the complement of the 64-bit integer operand, srca, and the comple- 
ment of the 64-bit integer operand, srcb, is stored in rdst. 
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6-46 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Boolean Instructions 6.4 


xor srca,srcb,rdst 

68 

Perform the bitwise xor of two 64-bit integers. 
srca, srcb 

rdst 

User or Supervisor mode 

Integer 


none 


The bitwise exclusive-or of two 64-bit integer operands, srca and srcb, is stored in 


rdst. 
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6.4 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Boolean Instructions 6-47 


xnor srca,srcb,rdst 
67 


Perform the bitwise xor of two 64-bit integers, and complement the 
result. 


srca, srcb 

rdst 

User or Supervisor mode 
Integer 


none 


The complement of the bitwise exclusive-or of the 64-bit integer operand, srca, and 
the 64-bit integer operand, srcb, is stored in rdst. 
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6-48 Shift Instructions 6.5 


6.5 Shift Instructions 


This section describes instructions that shift and rotate 64-bit integers. In addition, 
special shift instructions, dshfl and dshfr, which return 64 bits of a shifted 128-bit operand 
(composed of two independent 64-bit operands), are described. 
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6.5 Shift Instructions 6-49 


Instruction: rot srca,srcb,rdst 
Opcode: 66 

Operation: Rotate a 64-bit integer left or right. 
Operands used: srca, srcb 

Results stored: rdst 

Legal in: User or Supervisor mode 
Functional unit: Integer 

Exceptions: none 

Description: 


The signed integer operand srca gives the number of places to rotate the 64-bit inte- 
ger operand srcb. The result is stored in rdst. If srca is positive, then srcb is rotated left by 
srca places. If srca is negative, then srcb is rotated right by -srca places. Note that because 
two’s complement arithmetic is used, and because the operand length (64 bits) is a power of 
two, only the low six bits of srca have any effect on the operation. Therefore, any value of sr- 
ca is legal and the correct rotation will occur. 
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6-50 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Shift Instructions 6.5 


shf srca,srcb,rdst 

60 

Logically shift a 64-bit integer left or right. 
srca, srcb 

rdst 

User or Supervisor mode 

Integer 


none 


The signed integer operand srca gives the number of places to shift the 64-bit integer 
operand srcb. The result is stored in rdst. If srca is positive, then srcb is shifted left by srca 
places: zero bits are entered at the least significant end, and bits shifted off the most signifi- 
cant end are lost. If srca is negative, then srcb is shifted right by -srca places: zero bits are 
entered at the most significant end, and bits shifted off the least significant end are lost. 
Note that if the shift count is greater than 63 or less than -63, the result will be zero. 
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6.5 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Shift Instructions 


ashf srca,srcb,rdst 


61 


Arithmetically shift a 64-bit integer left or right. 


srca, srcb 

rdst 

User or Supervisor mode 
Integer 


none 


6-51 


The signed integer operand srca gives the number of places to shift the 64-bit integer 
operand srcb. The result is stored in rdst. If srca is positive, then srcb is shifted left by srca 
places: zero bits are entered at the least significant end, and bits shifted off the most signifi- 
cant end are lost. If srca is negative then srcb is shifted right by -srca places: the sign bit is 
repeatedly entered at bit 63, and bits shifted off the least significant end are lost. Note that if 
the shift count is 64 or greater the result will be zero. But if the shift count is equal to or 
more negative than -63, then the result will be zero or minus one depending on whether srcb 
was positive or negative, respectively. 
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6-52 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Shift Instructions 6.5 


dshfl srca,srcb,srcc 

6D 

Shift two 64-bit integers left 0 to 127 places. 
srca, srcb, srcc 

rdst (same register as srcc) 

User or Supervisor mode 

Integer 


none 


The two 64-bit integer operands, srcc and srcb are concatenated to form a 128-bit op- 
erand; srcc contributes the most significant bits, and srcb the least significant bits. The com- 
bined entity is shifted left a number of places determined by the low seven bits of the integer 
operand srca. The high 64 bits of the result are stored in rdst. Zero bits are entered at the 
least significant end and bits shifted off the most significant end are lost. Note that srcc may 
not be a constant, and that srcc and rdst both refer to the same register, R[RC]. 
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“y 


6.5 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Shift Instructions 6-53 


dshfr srca,srcb,srcc 

6E 

Shift two 64-bit integers right 1 to 128 places. 
srca, srcb, srcc 

rdst (same register as srcc) 

User or Supervisor mode 

Integer 


none 


The two 64-bit integer operands, srcb and srcc are concatenated to form a 128-bit op- 
erand; srcb contributes the most significant bits and srcc the least significant bits. The com- 
bined entity is shifted right a number of places determined by the integer operand srca. This 
operand is intended to be in the range -1 to -128 and will result in right shifts of from 1 to 
128 places. The low 64 bits of the result are stored in rdst. Zero bits are entered at the most 
significant end and bits shifted off the least significant end are lost. Note that srcc may not 
be a constant, and that srcc and rdst both refer to the same register, R[RC]. 

In more detail, the shift count is computed by replacing bits 7 through 63 of srca with 


all ones and then taking the two’s complement of the result. For example, if srca contains 
zero, then the shift count will be 128. 
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6-54 Bit Count and Reverse Instructions 6.6 


6.6 Bit Count and Reverse Instructions 


The instructions in this category are used to reverse the bits in a 64-bit integer and to 
perform certain bit counting functions. 
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6.6 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Bit Count and Reverse Instructions 


bitrev srca,rdst 

au 

Reverse the bits of a 64-bit integer. 
srca 

rdst 

User or Supervisor mode 

Integer 


none 


6-55 


The bits of the 64-bit integer, srca, are reversed and stored in rdst. In other words, 
bit position i is sent to bit position 63 - i. 
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6-56 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Bit Count and Reverse Instructions 6.6 


bitcnt.w srca,rdst 

5D 

Count the number of one bits in a 32-bit integer. 
srca 

rdst 

User or Supervisor mode 

Integer 


none 


The count of the number of bits in the 32-bit integer, srca, that are set to one is stored 


in rdst. 
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6.6 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Bit Count and Reverse Instructions 6-57 


Izent srca,rdst 

5E 

Count the number of leading zero bits in a 64-bit integer. 
srca 

rdst 

User or Supervisor mode 

Integer 


none 


The count of the number of leading (most significant) bits of the 64-bit integer, srca, 
that are set to zero is stored in rdst. A zero input produces a result of 64. 
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6-58 Flag Instructions 6.7 


6.7 Flag Instructions 


Flags can be written by integer and floating-point compare instructions and by the ad- 
de and subb instructions (all described elsewhere). In addition, they can be operated on di- 
rectly with the boof instructions. The assembler supports a number of different mnemonics 
for boof, including the use of flags as sources and destinations with the usual boolean mne- 
monics (and, or, andce, etc.). Refer to The K-1] Assembly Language Reference Manual for 
more details. 
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6.7 Flag Instructions 6-59 


Instruction: boof srca,fdst 

Opcode: 4F 

Operation: Perform a boolean operation on flags, storing the result in a flag. 
Operands used: srca, F[srca<2..0>], F[srca<5..3>] 

Results stored: fast 

Legal in: User or Supervisor mode 

Functional unit: Integer 

Exceptions: none 

Description: 


The low 9 bits of srca specify a boolean function and two flags on which to operate. 
The result of the operation is stored in flag fdst. Figure 6-1 shows the format of the srca op- 
erand, and Table 6-1 gives the encoding of the boolean function. In most applications, the sr- 


ca operand will be a short constant, not a register. 
000000 
8 65 320 
BC | FA | FB 
(3) ] (3)] (3) 


6 
3 


Figure 6-1. boof srca Argument 


Table 6-1. boof Code (BC) Decoding 


boof code O : 

(BC) peration 
0 FA and (not FB) 
1 (not FA) and (not FB) 
Z FA xor FB 
3 (not FA) or (not FB) 
4 FA and FB 
5 not (FA xor FB) 
6 FA or FB 
7 FA or (not FB) 
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6-60 Flag Instructions 6.7 


Instruction: boof srca,rdst 

Opcode: 2F 

Operation: Perform a boolean operation on flags, storing the result in a register. 
Operands used: srca, F[srca<2..0>], F[srca<5..3>] 

Results stored: rdst 

Legal in: User or Supervisor Mode 

Functional Unit: Integer 

Exceptions: none 

Description: 


The low 9 bits of srca specify a boolean function and two flags on which to operate. 
The single bit result of the operation is zero-extended to 64 bits and stored in rdst. Figure 6- 
1 shows the format of the srca operand, and Table 6-1 gives the encoding of the boolean func- 
tion. 
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6.8 Check Instructions 6-61 


6.8 Check Instructions 
The check instructions are used to determine if a 64-bit signed or unsigned integer 


can fit in a lesser precision. They do not store any results. Instead, if the operand will not fit 
in the smaller precision, a check trap occurs. 


Amdahl Key Computer Labs CONFIDENTIAL K-1 Architecture Manual 


6-62 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Check Instructions 6.8 


chk.b srca 

50 

Check if a 64-bit signed integer can fit in one byte. 
srca 

none 

User or Supervisor mode 

Integer 


Check 


The high 56 bits of the signed 64-bit integer, srca, are examined. If they all have the 
same value as bit 7, the sign bit of the low byte, then nothing happens. If any bit is different, 
then a check trap occurs. 
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6.8 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Check Instructions 6-63 


chk.h oe 

51 

Check if a 64-bit signed integer can fit in two bytes. 
srca 

none 

User or Supervisor mode 

Integer 


Check 


The high 48 bits of the signed 64-bit integer, srca, are examined. If they all have the 
same value as bit 15, the sign bit of the low 2-byte group, then nothing happens. If any bit is 
different, then a check trap occurs. 
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6-64 


Instruction: 
Opcode: 
Operation: 
Operands used: 
Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Check Instructions 6.8 


chk.w srca 

52 

Check if a 64-bit signed integer can fit in four bytes. 
srca 

none 

User or Supervisor mode 

Integer 


Check 


The high 32 bits of the signed 64-bit integer, srca, are examined. If they all have the 
same value as bit 31, the sign bit of the low 4-byte group, then nothing happens. If any bit is 
different, then a check trap occurs. 
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6.8 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Check Instructions 6-65 


chku.b srca 

53 

Check if a 64-bit unsigned integer can fit in one byte. 
srca 

none 

User or Supervisor mode 

Integer 


Check 


The high 56 bits of the unsigned 64-bit integer, srca, are examined. If they are all ze- 
ro, then nothing happens. If any bit is different from zero, then a check trap occurs. 
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6-66 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Check Instructions 6.8 


chku.h srca 

54 

Check if a 64-bit unsigned integer can fit in two bytes. 
srca 

none 

User or Supervisor mode 

Integer 


Check 


The high 48 bits of the unsigned 64-bit integer, srca, are examined. If they are all ze- 
ro, then nothing happens. If any bit is different from zero, then a check trap occurs. 
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6.8 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Check Instructions 6-67 


chku.w srca 

55 

Check if a 64-bit unsigned integer can fit in four bytes. 
srca 

none 

User or Supervisor mode 

Integer 


Check 


The high 32 bits of the unsigned 64-bit integer, srca, are examined. If they are all ze- 
ro, then nothing happens. If any bit is different from zero, then a check trap occurs. 
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CHAPTER 7. Transfer of Control Instructions 


The transfer of control instructions are used to change the value of the Program 
Counter, and include the subroutine call instructions and the absolute jump instructions. 
This chapter does not describe the assembly language br instruction. Rather than being an 
actual instruction, br indicates the use of PC-relative branch format, which does not require 
an opcode and is permitted only in the I1 half of an instruction word. All transfer of control in- 
structions (other than br) must use the 64-bit absolute branch or register branch instruc- 
tion formats or an illegal instruction/privilege violation trap will occur. (See Chapter 3 for 
more details on instruction formats.) 


The subroutine call instructions move a value (the return PC) to a register and then 
branch. The return PC will not be stored if the subroutine call instruction is not actually exe- 
cuted (i.e., if it is conditionally disabled). A subroutine returns by branching to an address in 
a register (the linkage register used by the subroutine call instruction). The return PC is a 
function of the delayed branch control field of the subroutine call instruction, and will be that 
of the call plus a small offset, as given in Table 3-2. This is true even if the call is in a delay 
slot and regardless of what instructions are in the call’s delay slots; the return PC calcula- 
tion is always based solely on the address of the call. 


For all instructions described in this chapter, and for any PC-relative branch, the new 
PC (and the stored PC for call instructions) will be truncated to 32 bits if the processor is in 
user mode and the Small Address Compatibility Mode bit in the Processor Status regis- 
ter is on. 
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7-2 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Transfer of Control Instructions T. 


call address,rdst 

09 

Move the PC to a register and jump to an absolute address. 
none 

rdst 

User or Supervisor mode 

Fetch 


Illegal instruction/privilege violation 


A return PC is stored in rdst. The return PC is an offset from the address of the call 
instruction, and depends upon the DC field as given in Table 3-2. The third following instruc- 
tion word will be fetched from address. This form of the call instruction must be used with 
the absolute branch instruction format; an illegal instruction/privilege violation trap will re- 
sult from its use with any other format. It is intended to be used as a subroutine call to an 
absolute address. The use of the call instruction in Trap State will produce an undefined re- 


turn PC. 
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Ts Transfer of Control Instructions 7-3 


Instruction: call (srca),rdst 

Opcode: 05 

Operation: Move the PC to a register and jump to an address in a register. 
Operands used: srca 

Results stored: rdst 

Legal in: User or Supervisor mode 

Functional unit: Fetch 

Exceptions: Illegal instruction/privilege violation 

Description: 


A return PC is stored in rdst. The return PC is an offset from the address of the call 
instruction, and depends upon the DC field as given in Table 3-2. The third following instruc- 
tion word will be fetched from an address specified by the operand srca. The low two bits of 
srca are treated as if they were zero. This form of the call instruction must be used with the 
register branch instruction format; an illegal instruction/privilege violation trap will result 
from its use with any other format. It is intended to be used as a subroutine call instruction 
where the address of the subroutine is in a register. The use of the call instruction in Trap 
State will produce an undefined return PC. 
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7-4 Transfer of Control Instructions Ts 


Instruction: jump address 

Opcode: 08 

Operation: Jump to an absolute address. 
Operands used: none 

Results stored: none 

Legal in: User or Supervisor mode 

Functional unit: Fetch 

Exceptions: Illegal instruction/privilege violation 
Description: 


The third following instruction word will be fetched from address. This form of the 
jump instruction must be used with the absolute branch instruction format; an illegal instruc- 
tion/privilege violation trap will result from its use with any other format. 
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Te Transfer of Control Instructions 7-5 


Instruction: jump (srca) 

Opcode: 04 

Operation: Jump to an address in a register. 
Operands used: srca 

Results stored: none 

Legal in: User or Supervisor mode 

Functional unit: Fetch 

Exceptions: Illegal instruction/privilege violation 
Description: 


The third following instruction word will be fetched from an address specified by the 
operand srca. The low two bits of srca are treated as if they were zero. This form of the jump 
instruction must be used with the register branch instruction format; an illegal instruc- 
tion/privilege violation trap will result from its use with any other format. It is intended to be 
used as a subroutine return instruction or for computed branches (as with dispatch tables or 
dynamic linking). 
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CHAPTER 8. Processor Status Register and Timer Instructions 


These instructions are used to read and write the Processor Status register (Table 
2-5) and the timers. The wps instruction writes the entire Processor Status register in su- 
pervisor mode, but only writes the upper 32 bits in user mode. Two special instructions, spl 
and srm, more efficiently modify specific fields in the Processor Status register. 
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8-2 Processor Status Register and Timer Instructions 8. 


Instruction: rps rdst 

Opcode: OB 

Operation: Read Processor Status register. 
Operands used: none 

Results stored: rdst 

Legal in: User or Supervisor mode 
Functional unit: Integer 

Exceptions: none 

Description: 


The Processor Status register is stored in rdst. Unused bits of the Processor Sta- 
tus register read as zero, but application programs must not rely on this fact since this may 
not be the case in other versions of the K-1 architecture. 
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8. Processor Status Register and Timer Instructions 8-3 


Instruction: wps srca 

Opcode: OC 

Operation: Write Processor Status register. 
Operands used: srca 

Results stored: none 

Legal in: User or Supervisor mode 
Functional unit: Integer 

Exceptions: none 

Description: 


If the processor is in supervisor mode then the entire operand, srca, is stored in the 
Processor Status register. If the processor is in user mode, then only the high-order 32 
bits of srca are stored in the high-order 32 bits of the Processor Status register; the low-or- 
der 32 bits of the Processor Status register remain unaffected. A trap may occur if an at- 
tempt is made to set undefined bits in the Processor Status register [8-2]. Note that the 
Processor Version Number field is read-only and is not affected by this instruction. Not 
all of the effects of executing a wps instruction happen immediately. See Appendix C for a 
description of the delays involved. 
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8-4 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Processor Status Register and Timer Instructions 8. 
spl srca,rdst 
0D 


Set the Processor Priority Level field of the Processor Status regis- 
ter, returning its old value. 


srca 

rdst 

Supervisor mode only 
Integer 


Illegal instruction/privilege violation 


If the processor is in user mode, then an illegal instruction/privilege violation trap oc- 
curs. Otherwise, the Processor Priority Level<3..0> field of the Processor Status regis- 
ter is set to the low four bits of srca. The previous value of this field is zero-extended to 64 
bits and stored in rdst. Note that this instruction does not take effect immediately. See Ap- 
pendix C for more details. 
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Instruction: 


Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Processor Status Register and Timer Instructions 8-5 
srm srca,rdst 


56 


Set the Rounding Mode field of the Processor Status register, return- 
ing its old value. 


srca 

rdst 

User or Supervisor mode 
Integer 


none 


The Rounding Mode<1..0> field of the Processor Status register is set to the low 
two bits of srca. The previous value of this field is zero-extended to 64 bits and stored in 
rdst. Note that this instruction does not take effect immediately. See Appendix C for more 


details. 
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8-6 Processor Status Register and Timer Instructions 8. 


Instruction: rut rdst 
Opcode: 5F 

Operation: Read the Uptime Counter. 
Operands used: none 

Results stored: rdst 

Legal in: User or Supervisor mode 
Functional unit: Integer 

Exceptions: none 

Description: 


The contents of the Uptime Counter are written to rdst. In any implementation where 
the Uptime Counter is less than 64 bits, the high-order bits of rdst are set to zero. 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Processor Status Register and Timer Instructions 8-7 
wit srca 
1F 


Write the Interval Timer register. 
srca 

none 

Supervisor mode only 

Integer 


Illegal instruction/privilege violation 


If the processor is in user mode, then an illegal instruction/privilege violation trap oc- 
curs. Otherwise, the Interval Timer register is written from the low-order bits of srca. 
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CHAPTER 9. Virtual Memory and Cache Instructions 


These very special instructions are used to modify the processor page tables and 
caches. Some of them behave differently in user mode than in supervisor mode. 
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9-2 Virtual Memory and Cache Instructions 9. 


Instruction: lipage srca 

Opcode: 18 

Operation: Load Instruction Page Table entry. 
Operands used: srca 

Results stored: none 

Legal in: Supervisor mode only 

Functional unit: Fetch 

Exceptions: Illegal instruction/privilege violation 
Description: 


If the processor is in user mode, an illegal instruction/privilege violation trap will oc- 
cur. If the processor is in supervisor mode, the operand, srca, is used to update the instruc- 
tion page table. The format of the srca operand is given in Figure 2-7 [9-1]. 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Virtual Memory and Cache Instructions 


Idpage srca,srcb 

19 

Load Data Page Table entry. 
srca, srcb 

none 


Supervisor mode only 
Load/Store 


Illegal instruction/privilege violation 


9-3 


If the processor is in user mode, an illegal instruction/privilege violation trap will oc- 
cur. The address of the page table entry to be affected is calculated by hashing the virtual 
page number field of srca with the Process Key<12..0> field of the Processor Status regis- 
ter [9-3]. If the processor is in supervisor mode, the page table entry at the resulting table 
address is replaced by srcb. The format of the srca operand is given in Figure 2-8, and the 
format of the srcb operand is given in Figure 2-9 [9-2]. 
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9-4 Virtual Memory and Cache Instructions 9. 


Instruction: ickill (srca) 

Opcode: 06 

Operation: Invalidate an instruction cache line and jump to an address in a regis- 
ter. 

Operands used: srca 

Results stored: none 

Legal in: User or Supervisor mode 

Functional unit: Fetch 

Exceptions: Illegal instruction/privilege violation, instruction page map miss 

Description: 


This instruction must be used with the register branch instruction format; an illegal 
instruction/privilege violation trap will result from its use with any other format. _ ickill 
branches to the address given by srca; the low two bits of srca are treated as if they were ze- 
ro. As is the case for all branch instructions, it is the third following instruction word fetch 
(called the target fetch) that is affected by this instruction. Prior to that fetch, however, a 
particular instruction cache line will be “killed”. The instruction cache line to be killed is 
based on the address in srca in an implementation-dependent way [9-4], but will always 
contain the instruction word addressed by srca. It is important to note that ickill does not 
check for a hit in the instruction cache — the cache line addressed by srca is killed whether 
there is a hit or not. This can be used by the operating system to kill instructions at arbitrary 
physical addresses (even though instruction addressing in supervisor mode is limited), 
since only the low bits of the address matter for ickill. 


During the target fetch, neither the instruction cache, instruction stache, nor memory 
will be referenced. Instead, a null instruction word will be created with both the IO and [1 in- 
structions disabled. An instruction page map miss can still result on the target instruction 
fetch. In case of an instruction page map miss, no line in the cache will be killed and an in- 
struction page map miss trap will be taken. 


The ickill instruction does not guarantee that the addressed line is killed in the in- 
struction stache. The iskill instruction may be used to kill the entire contents of the instruc- 
tion stache. 


A branch instruction executed in the first delay slot of an ickill will prevent continuing 
execution at the address given in srca. In the following example, instruction words are 
fetched from the following locations: A, B, C, XYZ (which is killed in the instruction cache 
and is not executed), D, D+8, etc. 
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9. Virtual Memory and Cache Instructions 9-5 


A: ickill XYZ, aa 

B: nop br D, bb 
Cc nop nop 

D: 

XYZ: 
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9-6 Virtual Memory and Cache Instructions 2. 


Instruction: iskill 

Opcode: 16 

Operation: Invalidate the entire instruction stache. 
Operands used: none 

Results stored: none 

Legal in: User or Supervisor mode 

Functional unit: Fetch 

Exceptions: none 

Description: 


The entire contents of the instruction stache are invalidated. Note that this instruc- 
tion will always be executed, even when used with an instruction format that supports condi- 
tional execution. In other words, it is not possible to conditionally execute this instruction. 
This instruction does not take effect immediately [9-9]. 
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9. Virtual Memory and Cache Instructions 9-7 


Instruction: dflush srca,srcb 
Opcode: 1D 

Operation: Manipulate a data cache line. 
Operands used: srca, srcb 

Results stored: none 

Legal in: User or Supervisor mode 
Functional unit: Load/Store 

Exceptions: dflush-type (see Table 5-1) 
Description: 


The srca operand contains either the address of a memory location, or of a particular 
data cache line; there are no address alignment requirements for srca. This instruction per- 
forms some operation on the data cache line that contains the address srca (if there is such a 
line), or on the data cache line addressed directly by srca. Which of these two interpreta- 
tions of srca is in effect is determined by a control bit in srcb. The operation to be performed 
is controlled by the low-order bits of srcb as enumerated in Table 9-1. 


Table 9-1. dflush srcb Control Functions 


Write data only (don’t write ECC bits) 
Write ECC bits only (treat data as ECC bits) 


Address a specific data cache line 
Broadcast flush 

Validate (update) the memory system 
Kill data cache line 


dflush is considered to be a store for purposes of access checking. Therefore, if the 
User Mode Store bit is on in the Processor Status register, the dflush is considered to be 
a user mode operation. In user mode operations, the User Protection bit in the Processor 
Status register controls the type of access granted. 


If bit 3 of srcb is zero, then srca is considered to be a virtual address, which is then 
mapped, possibly resulting in a data map miss trap or an illegal access trap if write permis- 
sion is not granted for the addressed page. If there is no trap, then the data cache address of 
srca will be computed. The addressed data cache line may or may not actually contain a valid 
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entry that may or may not be the entry for srca. (If srca does have an entry, it will be the ad- 
dressed data cache line). The case where the addressed data cache line does not contain sr- 
ca is called a data cache miss. 


If bit 3 is a one, then the address in srca is used to address the data cache and the 
line selected is whatever line may be present (valid or not) at that location in the data 
cache. Which bits of srca give the data cache line address is implementation-dependent [9- 
5]. 

Bits 0 and 1 of srcb control the operation to be performed on the data cache line select- 
ed by the above process. In the case of a data cache miss, no operation is performed on this 
CPU’s data cache. If bit 1 is set, then any modified data in the data cache line is written back 
to memory. If bit 0 is set, then the data cache line is invalidated (a subsequent reference to 
the address in srca will result in a data cache miss). Any combination of these bits (with at 
least one of them set) may be selected. Care should be take when killing a line without vali- 
dating it as any modifications to the line will be lost. 


If bits O and 1 are both clear, then a special delayed write buffer flush is done. In 
this case, bits 2, 3, 4, and 5 must all be zero. The least-significant bits of srca give the num- 
ber of the delayed write buffer to be flushed. Delayed write buffers are used to perform store 
instructions more efficiently by delaying the store until the access can be verified in the data 
page table and the data cache tags. The ability to flush delayed write buffers is intended only 
for diagnostic purposes. The number of delayed write buffers is implementation-dependent 
[9-10]. 

Bit 2 of srcb, if set, forces bits 3, 4, and 5 to zero and causes the dflush request to be 
“broadcast” to all processors in the same multiprocessor system as the current processor. 
A dflush instruction is also broadcast whenever the data being referenced is shared and bits 
3, 4, and 5 of srcb are all clear. Note that this means that all dflush instructions which are su- 
pervisor mode references (except those with bits 3, 4, or 5 of srcb set) will be broadcast. 


The broadcast of a dflush, sometimes called a shared dflush, uses the cache coher- 
ence scheme to ensure data integrity across the cache’s of all the processors in a multipro- 
cessor system. All processors will be requested to perform the same operation on the same 
physical address. This broadcast happens even if there was a data cache miss on the access 
of srca and even if the page is not marked as shared in the data page table. The use of bit 2 
to enable the broadcast feature allows the operating system to remove areas of memory on 
which I/O is being performed from the data caches of all CPUs, even if those areas are not 
marked as shared in the data page table. 


Bits 4 and 5 of srcb are intended for use by diagnostic programs. Bit 4 causes any da- 
ta writeback that occurs to be directed to the ECC bits rather than the data bits. Bit 5 pre- 
vents the ECC bits from being written. See Appendix E for details of referencing the ECC 
bits in memory. Unpredictable results will occur if either bits 4 and 5 are both on at the same 
time, or bits 0 and 1 are not both on if either bit 4 or 5 is on. 


If the operation is in user mode (the User Mode Store bit is on in the Processor 
Status register), then bits 3, 4 and 5 of srcb will be forced to zero. 
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Instruction: zcl (srca) 

Opcode: FF 

Operation: Write zeroes to an entire cache line. 
Operands used: srca 

Results stored: mem(cache line size, srca) 

Legal in: User or Supervisor mode 
Functional unit: Load/Store 

Exceptions: zcl-type (see Table 5-1) 
Description: 


The address of a memory operand is given in srca. If the low-order bits of the ad- 
dress are not zero, then an illegal address trap will occur. The number of low-order bits that 
are checked is implementation-dependent [9-6]. The virtual address is mapped and the ac- 
cess is checked, possibly resulting in a data map miss or an illegal access trap. If no trap oc- 
curs, then the entire data cache line at the resultant physical address will be set to zero [9- 
7). If a data cache miss occurs, then no access will be made to memory to bring in the old da- 
ta cache line (since the line will be set to zero anyway). This instruction operates on the da- 
ta cache only. 


zcl is considered to be a store for purposes of access checking. Therefore, if the Us- 
er Mode Store bit is on in the Processor Status register, the operation is considered to be 
a user mode operation. In a user mode operation, the User Protection bit in the Proces- 
sor Status register controls the type of access granted. 


Note that while zcl is a very efficient means of clearing data in the data cache, it is 
only efficient for non-shared references. All shared references must go through the memory 
system, and thus shared zcls run at memory speed. Non-shared zcls can be done locally in 
each CPU, and will run significantly faster. See Appendix C for more timing information. 
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Instruction: pel (srca) 

Opcode: F7 

Operation: Preload cache line. 

Operands used: srca, mem(cache line size, srca) 
Results stored: none 

Legal in: User or Supervisor mode 
Functional unit: Load/Store 

Exceptions: none 

Description: 


The address of a memory operand is given in srca. The virtual address is mapped and 
the access is checked. If a map miss or access violation condition occurs, then the instruc- 
tion completes without doing anything (and no trap will be taken). If no miss or access viola- 
tion occurs, then the cache line containing the physical address will be read into the data 
cache. If the line is already in the data cache, then this instruction does nothing. If, in read- 
ing the line from the memory system, a non-existent memory or uncorrectable memory error 
is detected, then the operation is aborted and no error is reported. 


pel is considered to be a load for purposes of access checking. Therefore, if the User 
Mode Load bit is on in the Processor Status register, the operation is considered to be a 
user mode operation. In a user mode operation, the User Protection bit in the Processor 
Status register controls the type of access granted [9-8]. 
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CHAPTER 10. Trap Instructions 


These instructions cause traps, extract trap information from the processor after a 
trap occurs, and resume execution after a trap is handled. 
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10-2 Trap Instructions 10. 


Instruction: bpt 

Opcode: 10 

Operation: Cause a breakpoint trap. 
Operands used: none 

Results stored: none 

Legal in: User or Supervisor mode 
Functional unit: Decode 

Exceptions: bpt 

Description: 


This instruction causes a bpt trap to be taken. No following instructions will have 
been executed (i.e. this trap is precise). It is intended to be used by debuggers to cause exe- 
cution to be cleanly suspended at any desired point in the program. Note that to unambigu- 
ously determine whether it was an IO or an I] instruction that took a bpt trap, bpt instruc- 
tions should always use a 32-bit instruction format. 
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10. 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Trap Instructions 10-3 


xtrap 

13 

Cause an xtrap trap. 
none 

none 

Supervisor mode only 
Decode 


Illegal instruction/privilege violation, xtrap 


This instruction causes an xtrap trap to be taken. No following instructions will have 
been executed (i.e. this trap is precise). This instruction is intended to be used by the oper- 
ating system to enter Trap State prior to returning from a trap. If the processor is not in su- 
pervisor mode, then an illegal instruction/privilege violation trap occurs. 
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10-4 Trap Instructions 10. 


Instruction: strap 

Opcode: 12 

Operation: Cause a system call trap. 
Operands used: none 

Results stored: none 

Legal in: User or Supervisor mode 
Functional unit: Decode 

Exceptions: strap 

Description: 


This instruction causes an strap trap to be taken. No following instructions will have 
been executed (i.e. this trap is precise). This instruction is intended to be used by user pro- 
grams requesting operating system services. 
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10. Trap Instructions 10-5 


Instruction: trap 

Opcode: 11 

Operation: Cause a trap trap. 
Operands used: none 

Results stored: none 

Legal in: User or Supervisor mode 
Functional unit: Decode 

Exceptions: trap 

Description: 


This instruction causes a trap trap to be taken. No following instructions will have 
been executed (i.e. this trap is precise). This instruction is intended to be used by user pro- 
grams to indicate certain abnormal conditions to the operating system. 
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10-6 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Trap Instructions 10. 


rtrpd srca,rdst 
OF 

Read 64 bits of trap data. 
srca 

rdst 

Trap State only 

Integer 


Illegal instruction/privilege violation 


If the processor is in user mode, then an illegal instruction/privilege violation trap oc- 
curs. If the processor is in supervisor mode but not in Trap State, then the result will be 
unpredictable. A 64-bit unit of trap data is stored in rdst. Which unit of trap data is deter- 
mined by srca in an implementation-dependent fashion. See Appendix D for details on trap 


handling. 
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10. 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Trap Instructions 10-7 


slstrpd srca,srcb 

IC 

Store 64 bits of load/store trap data to memory. 
srca, srcb 

mem(8, srca) 

Trap State only 

Load/Store 


Illegal instruction/privilege violation, store-type (see Table 5-1) 


If the processor is in user mode, then an illegal instruction/privilege violation trap oc- 
curs. If the processor is in supervisor mode but not in Trap State, then the result will be 
unpredictable. The address of a memory operand is given in srca. A 64-bit unit of load/store 
trap data is stored in memory at the address specified by srca. Which unit of trap data is de- 
termined by srcb in an implementation-dependent fashion. See Appendix D for details on 
trap handling. The slstrpd instruction is subject to various traps associated with store in- 
structions, and traps when in Trap State will halt the processor. 
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10-8 Trap Instructions 10. 


Instruction: exts srca,address,rdst 
Opcode: 07 
Operation: Exit trap state, load a register from an absolute address which is first 


transformed to be unique to one CPU in a multiprocessor, and jump to 
an address in a register. 


Operands used: srca, mem(8, map(8*exts Load Address)) 

Results stored: rdst 

Legal in: Trap State only 

Functional unit: Fetch, Decode, Load/Store 

Exceptions: Illegal instruction/privilege violation, load-type (see Table 5-1) except 


no illegal address trap is possible 


Description: 


If the processor is in user mode, an illegal instruction/privilege violation trap occurs. 
If the processor is in supervisor mode but not in Trap State, the result will be unpredict- 
able. If the processor is in Trap State, several operations are performed simultaneously by 
this instruction. First, the contents of srca specify the address of the third following instruc- 
tion fetch. Second, a memory address is calculated as map(8*exts Load Address), which 
for this instruction includes an implementation-dependent transformation to make the ad- 
dress unique among all of the CPUs in a multiprocessor [10-1]. The 64-bit word at the re- 
sultant address will be written to rdst. (The exts Load Address field specifies bits 35..3 of 
the virtual address of the 64-bit word to be loaded into rdst; the other bits of the virtual ad- 
dress are set to zero.) Finally, before the instruction at the address specified by srca is exe- 
cuted, the processor will have exited Trap State. 


exts must be used in a very specific fashion to exit from Trap State. Once the first 
exts is executed, two more exts’s must be immediately executed with no intervening dis- 
abled instructions. Three sequential exts instructions are necessary to reload the three Re- 
start PCs. See the section on Traps, Interrupts, and Machine Checks in Chapter 2 for 
more information on the Restart PCs, and Appendix D for details on trap handling. 


This instruction must be used with the exts instruction format; an illegal instruc- 
tion/privilege violation trap will result from its use with any other format. The exts instruc- 
tion is subject to various traps associated with load instructions, and traps when in Trap 
State will halt the processor. The use of the exts format with either of bits 8 or 7 set to one, 
or with a delayed execution control field that disables either of the following two instructions, 
will produce unpredictable results. 
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CHAPTER 11. I/O Instructions 


I/O instructions are privileged instructions that are used to communicate control infor- 
mation to and from I/O processors. Also listed here are instructions for communication with 
the Console. 
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11-2 I/O Instructions 11. 


Instruction: rios srca,srcb,rdst 
Opcode: FE 

Operation: Read I/O status. 

Operands used: srca,srcb 

Results stored: rdst 

Legal in: Supervisor mode only 

Functional unit: Load/Store 

Exceptions: Illegal instruction/privilege violation 
Description: 


The operands srca and srcb are used to select an I/O status word. The selected I/O 
status is read and stored in the low-order 32 bits of rdst. The formats of srca, srcb, and of the 
I/O status read are described in Appendix F. The upper 32 bits of the result contain a status 
code indicating if the operation was successful or not. The status code is stored as a 2-bit 
number in bit positions 63 and 62. If the status code has the value 0, there were no errors 
(and the lower 32 bits of the result contain the I/O status). If the status code is non-zero, 
there was some type of error (and the lower 32 bits of the result should be ignored). 


If the status code has the value 2, the I/O system was busy and the operation timed 
out before it could be sent. If the status code has the value 3, the operation was sent, but 
timed out before a reply was received. If the status code has the value 1, there was a proto- 
col violation — the I/O system failed to respond to the rios request. 


This instruction is legal only in supervisor mode. If a user mode program attempts 
to execute this instruction, an illegal instruction/privilege violation trap will result. 
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EL I/O Instructions 11-3 


Instruction: wios srca,srcb,rdst 
Opcode: FD 

Operation: Write I/O status. 

Operands used: srca,srcb 

Results stored: rdst 

Legal in: Supervisor mode only 

Functional unit: Load/Store 

Exceptions: Illegal instruction/privilege violation 
Description: 


The operands srca and srcb are used to send control information to the I/O system. 
The formats of srca and srcb are described in Appendix F. For wios, the lower 32 bits of the 
result are not used (and should be ignored). As with rios, the upper 32 bits of the result con- 
tain a status code indicating if the operation was successful or not. The status code is stored 
as a 2-bit number in bit positions 63 and 62. If the status code has the value 0, there were 
no errors (and the wios was successful). If the status code is non-zero, there was some 
type of error (and the results are unpredictable — the wios may or may not have modified 
state in the I/O system). 


If the status code has the value 2, the I/O system was busy and the operation timed 
out before it could be sent. If the status code has the value 1, there was a protocol viola- 
tion — the I/O system failed to acknowledge the wios request. The status code returned by a 
wios should never have the value 3. 


This instruction is legal only in supervisor mode. If a user mode program attempts 
to execute this instruction, an illegal instruction/privilege violation trap will result. 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


I/O Instructions 11. 


rfec srca,rdst 
E2 

Read Console data. 
srca 

rdst 

Supervisor mode only 
Load/Store 


Illegal instruction/privilege violation 


The Console data/control read port is read into the low-order 16 bits of rdst; the high- 
order 48 bits are set to zero. The low 2 bits of srca (srca<1..0>) control which Console read 
port is accessed — the low-priority read port (when srca<1..0> has the value 0), the debug- 
ger read port (when srca<1..0> has the value 1), or the high-priority read port (when sr- 
ca<1..0> has the value 2). The next bit of srca (srca<2>) controls whether the read is de- 
structive (if srca<2> is on), or non-destructive (if srca<2> is off). See Appendix G for de- 
tails on the interface with the Console. 


This instruction is legal only in supervisor mode. If a user mode program attempts 
to execute this instruction, an illegal instruction/privilege violation trap will result. 
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Li. 


Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


I/O Instructions 11-5 


wfec srca 
F2 

Write Console data. 
srca 

none 

Supervisor mode only 
Load/Store 


Illegal instruction/privilege violation 


The low-order 16 bits of the srca operand are written into the Console data/control 
write port, and an interrupt may be sent to the Console. The write port must not be full or un- 
predictable results will occur. See Appendix G for details on the interface to the Console. 


This instruction is legal only in supervisor mode. If a user mode program attempts 
to execute this instruction, an illegal instruction/privilege violation trap will result. 
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CHAPTER 12. Miscellaneous Instructions 


These instructions are used to perform certain miscellaneous control functions. 
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Instruction: 
Opcode: 


Operation: 


Operands used: 


Results stored: 


Legal in: 


Functional unit: 


Exceptions: 


Description: 


Miscellaneous Instructions 


halt 

01 

Halt. 

none 

none 

Supervisor mode only 
Decode 


Illegal instruction/privilege violation 


12. 


If the processor is in user mode, an illegal instruction/privilege violation trap occurs. 
If the processor is in supervisor mode, it is halted. Execution may be continued at the in- 
struction following the halt by the Console. Note that it is possible to continue from a halt in 


Trap State. 
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12. Miscellaneous Instructions 12-3 


Instruction: nop 

Opcode: 00 

Operation: No operation. 

Operands used: none 

Results stored: none 

Legal in: User or Supervisor mode 
Functional unit: Decode 

Exceptions: none : 
Description: 


No operation is performed. If the long constant instruction format is used, several 
special delay features are invoked. 


Several bits in the LCON field of a long constant format nop instruction can suspend 
execution until certain outstanding operations complete. These bits operate independently, 
so that if multiple bits are set, execution will be suspended until the last of the events com- 
pletes. 


Bit 19 of the LCON field, if set, cause the nop instruction’s issue to be delayed until 
the load/store unit’s pipeline is empty. In other words, the nop will issue no sooner than 
would an instruction that used a result from the /ast instruction sent to the load/store unit. 
Bit 18 of the LCON field has the same effect for the floating-point divide/square root unit, bit 
17 for the floating-point multiply unit, and bit 16 for the floating-point add unit. 


Certain memory requests, such as writing back a cache line due to a dflush, run “in 
the background” without holding up the load/store unit pipeline. Bit 20 of the LCON field, if 
set, causes the nop instruction’s issue to be delayed until any outstanding memory request 
(other than an instruction cache miss read) is complete. This feature is most useful for I/O 
interactions, where it is necessary to wait for memory to be updated before initiating an I/O 
transfer. 


In addition to bits 20 to 16, the low three bits of the LCON field of a long constant 
format nop instruction specify the minimum number of extra cycles (from 0 to 7) that the nop 
will take. If the low three bits are zero, then the nop will issue (barring any flag interlocks 
and the effects of bits 20 to 16) in one cycle. 


See Appendix C for more details on instruction timing. 
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CHAPTER 13. Undefined Opcodes 


All opcodes not defined in the individual instruction descriptions are illegal; executing 
one will give unspecified results. In the initial K-1 implementation, these opcodes cause an 
illegal instruction/privilege violation trap. 


Amdahl Key Computer Labs CONFIDENTIAL K-1 Architecture Manual 


Appendix A. Instruction Index (Alphabetic) 


add SA 6-2 cmpugt.b 2B 6-25 
add.d 90 4-16 cmpugt.d 39 4-3 
add.s 80 4-17 cmpugt.h 2A 6-26 
addc 58 6-3 cmpugt.1 28 6-28 
addt 59 6-4 cmpugt.s 38 4-4 
and 63 6-40 cmpugt.w 29 6-27 
andcc 65 6-42 cmpun.d 3D 4-3 
andtc 69 6-41 cmpun.s SC 4-4 
ashf 61 6-51 cvtd.l 95 4-8 
cvtd.s 94 4-6 
cvtl.d . 97 4-10 
cvtl.s 96 4-12 
bitcnt.w 5D 6-56 cvts.d 84 4-7 
bitrev 5C 6-55 cvts.1 85 4-9 
boof 2F 6-60 cvtul.d 98 4-11 
boof 4F 6-59 
bpt 10 10-2 
dflush 1D 9-7 
div.d A3 4-22 
call 05 7-3 div.s A2 4-23 
call 09 7-2 divssr A6 6-15 
chk.b 50 6-62 divsst A4 6-14 
chk.h 51 6-63 dshfl 6D 6-52 
chk.w 52 6-64 dshfr 6E 6-53 
chku.b 53 6-65 
chku.h 54 6-66 
chku.w 55 6-67 
cmpeq.b 27 6-17 echk F3 5-50 
cmpeq.d 3B 4-3 eload.b :1 El 5-28 
cmpeq.h 26 6-18 eload.b EO 5-29 
cmpeq.| 24 6-20 eload.h :1 E5 5-30 
cmpeq.s 3A tat eload.h :2 E6 5-30 
cmpeq.w 25 6-19 eload.h E4 5-31 
cmpge.d 37 4-3 eload.| :1 ED 5-26 
cmpge.s 36 4-4 eload.1 :8 EE 5-26 
cmpgt.b 23 6-21 eload.1 EC 5-27 
cmpgt.d 31 4-3 eload.w :1 E9 5-32 
cmpgt.h 22 6-22 eload.w :4 EA 5-32 
cmpgt.l 20 6-24 eload.w E8 5-33 
cmpgt.s 30 44 eload[u].1 :1 ED 5-26 
cmpgt.w 21 6-23 eload[u].1 :8 EE 5-26 
cmpleg.d 35 4-3 eload{u].1 EC 5-27 
cmpleg.s 34 44 eloadu.b :1 Fl 5-20 
cmplg.d 33 4-3 eloadu.b FO 5-21 
cmplg.s 32 44 eloadu.h :1 F5 5-22 
cmpueq.d 3F 4-3 eloadu.h :2 F6 5-22 
cmpueq.s 3E 44 eloadu.h F4 5-23 
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eloadu.| :1 ED 5-26 Izent SE 6-57 
eloadu.! :8 EE 5-26 
eloadu.| EC 5-27 
eloadu.w :1 F9 5-24 
eloadu.w :4 FA 5-24 move 4B 6-30 
eloadu.w F8 5-25 move.d 93 6-31 
exts 07 10-8 mult.d Al 4-20 
mult.s AO 4-21 
multhss A7 6-9 
multhus AF 6-13 
halt 01 12-2 multhuu AB 6-11 
multlss A5 6-8 
multlus AD 6-12 
multluu A9 6-10 
ickill 06 9-4 
iskill 16 9-6 
neg.d 92 4-14 
neg.s 82 4-15 
jump 04 7-5 nop 00 12-3 
jump 08 74 
or 6A 6-43 
Idecc DC 5-47 orce 6C 6-45 
Idnecc BC 5-48 ortc 6F 6-44 
Idpage 19 9-3 
lipage 18 9-2 
load.b :1 cl 5-14 
load.b Co 5-15 pel F7 9-10 
load.h :1 C5 5-16 
load.h :2 C6 5-16 
load.h C4 5-17 
load.1 :1 CD 5-12 relf C2 5-51 
load.1 :8 CE 5-12 rfec E2 114 
load.1 cc 5-13 rios FE 11-2 
load.w :1 C9 5-18 rot 66 6-49 
load.w :4 CA 5-18 Ips OB 8-2 
load.w C8 5-19 rtrpd OF 10-6 
load[u].1 :1 CD 5-12 rut 5F 8-6 
load[u].1 :8 CE 5-12 
load[u].1 CE 5-13 
loadcpu 8D 545 
loadu.b :1 D1 5-6 sel f0 40 6-38 
loadu.b DO 5-7 sel f1 41 6-38 
loadu.h :1 DS 5-8 sel f2 42 6-38 
loadu.h :2 D6 5-8 sel f3 43 6-38 
loadu.h D4 5-9 sel f4 44 6-38 
loadu.! :1 CD 5-12 sel £5 45 6-38 
loadu.1 :8 CE 5-12 sel £6 46 6-38 
loadu.| CC 5-13 sext.b 4C 6-35 
loadu.w :1 D9 5-10 sext.h 4D 6-36 
loadu.w :4 DA 5-10 sext.w 4E 6-37 
loadu.w D8 5-11 shf 60 6-50 
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store.b :1 
store.b 
store.h :1 
store.h :2 
store.h 
store.] :1 
store.1 :8 
store.] 
store.w :1 
store.w :4 
store.w 
storecpu 
strap 

sub 

sub.d 
sub.s 
subb 
subt 

swat 


trap 
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5-54 
5-52 
11-5 
11-3 
8-7 
8-3 


6-47 
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00 = nop 12-3 31  cmpgt.d 4-3 
01 shalt 12-2 32 ~~ cmplg.s 4-4 
04 jump 7-5 33 cmplg.d 4-3 
05 ~~ call 7-3 34 ~cmpleg.s 4-4 
06 ickill 94 35 cmpleg.d 4-3 
07 ~—exts 10-8 36 ~=cmpge.s 4-4 
08 jump 714 37 ~cmpge.d 4-3 
09 ~~ call 7-2 38  cmpugt.s 4-4 
OB rps 8-2 39  cmpugt.d 4-3 
OC  wps 8-3 3A cmpeq.s 4-4 
OD spl 8-4 3B ~—s cmpeq.d 4-3 
OF  rtrpd 10-6 3C cmpun.s 4-4 


3D ~=cmpun.d 4-3 
3E  cmpueq.s 4-4 
3F cmpueq.d 4-3 


10 ~—sbpt 10-2 
11 strap 10-5 
12 strap 10-4 
13. xtrap 10-3 40 sel f0 6-38 
16 __ iskill 9-6 41 sel fl 6-38 
18 __lipage 9-2 42 — sel f2 6-38 
19 = Idpage 9-3 43 sel f3 6-38 
1C _ slstrpd 10-7 44 sel f4 6-38 
1D dflush 9-7 45 sel f5 6-38 
1F_ wit 8-7 46 sel [6 6-38 
48 — subb 6-6 
49 subt 6-7 
4A — sub 6-5 
20 = cmpgt.l 6-24 4B move 6-30 
21  cmpgt.w 6-23 4C _ sext.b 6-35 
22 + ~ cmpgt.h 6-22 4D — sext.h 6-36 
23s cmpgt.b 6-21 4E — sext.w 6-37 
24 ~=cmpeq.l 6-20 4F  boof 6-59 
25 = cmpeq.w 6-19 
26  cmpeq.h 6-18 
27 ~~ cmpeq.b 6-17 
28 cmpugt.l 6-28 50 —_ chk.b 6-62 
29 ~=cmpugt.w 6-27 51 chk.h 6-63 
2A  cmpugt.h 6-26 52 chk.w 6-64 
2B —scmpugt.b 6-25 53 chku.b 6-65 
2C —zext.b 6-32 54 chku.h 6-66 
2D _zext.h 6-33 55 chku.w 6-67 
2E = zext.w 6-34 56 srm 8-5 
2F _boof 6-60 58 addc 6-3 
59 addt 6-4 
5A add 6-2 
SC _bitrev 6-55 
30 cmpgt.s 44 5D _bitcnt.w 6-56 
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SE _Izcnt 6-57 AF multhus 6-13 
5F rut 8-6 
B2 _ sqrt.s 4-25 
60 — shf 6-50 B3 _ sqrt.d 4-24 
61 —_ashf 6-51 BC _Idnecc 5-48 
63 = and 6-40 
65 —_andcc 6-42 
66 ~—srot 6-49 
67  ~=xnor 6-47 CO load.b 5-15 
68  xor 6-46 Cl _—load.b:1 5-14 
69 —_andtc 6-41 C2 ~—sreilf 5-51 
6A or 6-43 C3 _ store.b 5-36 
6C orcc 6-45 C4 —load.h 5-17 
6D dshfl 6-52 C5 _—load.h:1 5-16 
6E — dshfr 6-53 C6 ~—load.h :2 5-16 
6F  ortc 6-44 C7 - store.h 5-38 
C8 load.w 5-19 


C9 _—load.w:1 5-18 
CA _ load.w :4 5-18 


80 adds 4-17 CB - store.w 5-40 
81 sub.s 4-19 CC _ load[u].l 5-13 
82 _—sineg.s 4-15 CD Ioad{uJ.l:1 5-12 
84 cvts.d 4-7 CE load[u].1:8 5-12 
85 _evts.l 4-9 CF - store.] 5-42 
8D ___loadcpu 5-45 
8F _ storecpu 546 
DO —_loadu.b 5-7 
D1 _—_loadu.b:1 5-6 
90 = add.d 4-16 D2 welf 5-52 
91 sub.d 4-18 D3 _ store.b:1 5-35 
92 ~—sneg.d 4-14 D4 _—loadu.h 5-9 
93 ~=move.d 6-31 DS  loadu.h:1 5-8 
94 ~—sevtd.s 4-6 D6 —_loadu.h :2 5-8 
95 evtd.l 4-8 D7 © store.h:1 5-37 
96s cvtl.s 4-12 D8 __loadu.w 5-11 
97 ~—sevtil.d 4-10 D9 —loadu.w :1 5-10 
98 — cvtul.d 4-11 DA _loadu.w :4 5-10 
DB _ store.w:1 5-39 
DC _ldecc 5-47 
DF _ store.l:1 5-41 
AO mults 4-21 
Al mult.d 4-20 
A2_ div.s 4-23 
A3 _ div.d 4-22 EO eload.b 5-29 
A4 _— divsst 6-14 El — eload.b:1 5-28 
A5  multlss 6-8 E2  rfec 114 
A6 __ divssr 6-15 E4 ~ eload.h 5-31 
A7  multhss 6-9 ES — eload.h:1 5-30 
A9 = multluu 6-10 E6 ~ eload.h :2 5-30 
AB multhuu 6-11 E7 _ store.h :2 5-37 
AD multlus 6-12 E8 — eload.w 5-33 
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E9 ~ eload.w :1 5-32 
EA _ eload.w :4 5-32 
EB _ store.w :4 5-39 
EC _ eload{[u].1 5-27 
ED eload[u].1:1 5-26 
EE eload[u].1:8 5-26 
EF _ store.1:8 541 


FO — eloadu.b §-21 
Fl — eloadu.b:1 5-20 
F2~~_—wfec 11-5 
F3 —echk 5-50 
F4 _—_eloadu.h 5-23 
F5  eloadu.kh:1 5-22 
F6 eloadu.h:2 5-22 


F9  eloadu.w:1 5-24 
FA eloadu.w:4 = 5-24 


FB wdwp 5-54 
FC swat 5-44 
FD wios 11-3 
FE _rios 11-2 
FF zcl 9-9 
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Appendix C. Instruction Timing Considerations 


This appendix contains information that is applicable to the Version 1 implementation 
of the K-1 architecture. Future versions of the K-1 processor may make different implemen- 
tation decisions, but should affect only those topics that are covered in this appendix. 


C.1 Clock Cycles 


The K-1 processor, memory, and I/O systems operate synchronously and are timed 
by a constant frequency main clock generator. All operations described in this appendix take 
an integral number of clock periods (usually referred to as clock cycles, or just cycles) to 
complete. The speed of the clock is given either as a frequency or as the duration of a single 
clock cycle. In the Version 1 implementation, the clock cycle is 6.5 nanoseconds. 


C.2 Instruction Issue 


Instruction words are fetched from the instruction stache according to the program 
flow. As explained in Chapter 2, an instruction word may contain a single 64-bit IO instruc- 
tion, or both 32-bit IO and I] instructions. Each instruction in an instruction word is decoded 
and, if appropriate, is issued for execution. To issue an instruction means to send the in- 
struction to a particular functional unit to be executed. In this context, the fetch and decode 
units are considered to be functional units. Some number of clock cycles later, the functional 
unit will write its result (if any) to the register file and/or will cause a trap. Note that while 
the functional units send their results to the register file at the same time as they signal a 
trap, instructions that use the results of a trapping instruction may be issued for two cycles 
before the trap “takes effect” and stops instruction issue. 


An important feature of the K-1 architecture is that, other than for trapping, it pre- 
serves the semantics of a “sequential” machine (one that issues no more than one instruc- 
tion per clock cycle and waits for that instruction to complete before issuing another). When- 
ever the K-1 issues more than one instruction in a single cycle, it guarantees that those in- 
structions do not conflict with each other in any way. Thus, instructions will never be issued 
“out of order” (in a different order than they would have been issued on a sequential ma- 
chine), though often instructions that would have been issued on consecutive cycles on a se- 
quential machine can be issued at the same time by the K-1. 


The Version 1 implementation of the K-1 is capable of issuing two 32-bit instructions 
from the same instruction word, or one 64-bit instruction, in a single clock cycle. Conditions 
that can delay instruction issue are explained below. 
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C.3 Pipelining 


The functional units in the K-1 are heavily pipelined. This means that they typically 
take more than one clock cycle to complete a single operation, but that they can begin a new 
operation every cycle. If a functional unit has a pipeline of length three, it can be doing three 
separate computations at the same time. Of course, the computations pass through the 
three separate phases of the calculation sequentially and independently of one another. 


In order to take advantage of the pipelining of functional units in the K-1, it is neces- 
sary to issue instructions as soon as they become ready. In the optimal case, two instruc- 
tions will be issued each cycle. At most one instruction can be issued to a given functional 
unit in any given cycle. Therefore, a pipelined functional unit is completely busy if it starts a 
new operation every cycle. The floating-point divide/square root unit is not pipelined; a maxi- 
mum of two divides and one square root can be in progress at any one time. (See the sub- 
section on the Floating-Point Divide/Square Root Unit.) 


C.4 Functional Unit Latency and Interlocks 


If an instruction is issued to a functional unit with a long pipeline, it is possible that 
the program may proceed to a point where it wishes to use the result of that instruction be- 
fore the functional unit has written the result to the register file. Before describing how the 
K-1 deals with such a situation, two terms must be defined. 


The latency of a functional unit is defined to be the number of cycles that must pass 
after the issue of a (result-returning) instruction to that functional unit before the result can 
be used. For example, in the following program: 


add.d %r1,%r2,%r3 (0) 
<instruction word> (1) 
<instruction word> (2) 
<instruction word> (3) 
<instruction word> (4) 
<instruction word> (5) 
move %r3,%r4 (6) 


the move instruction is in the first position which can use the result of the add.d instruction. 
Since the move instruction can not issue sooner than relative cycle 6, the latency of the float- 
ing-point add functional unit must be 6. The latencies for the K-1 functional units are given 
in Table C-1. 


If the move instruction in the above example was located at cycle 0 through 5 relative 
to the add.d, then it would be ready to issue before all its operands were ready. A hardware 
mechanism provided to detect this situation and delay the issue of the instruction until the 
operands are available is called an interlock. Interlocks can also delay the issue of instruc- 
tions for reasons other than the unavailability of register operands. Interlocks are provided 
wherever they are needed. That is, a program will never be incorrect because it failed to wait 
long enough before using the result of a previous instruction. 
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Table C-1. Functional Unit Latencies 


Integer 
Load/Store 


Floating-Point Add 

Floating-Point Multiply 8 
Floating-Point Divide 20,25, 35 $ 
Floating-Point Square Root 32, 61 


C.5 K-1 Interlocks 


This section enumerates the different types of interlocks in the K-1 and explains their 
effects on instruction timing. 


C.5.1 Register Interlocks 


Register interlocks are generated when an attempt is made to issue an instruction 
when one or more of its source or destination registers is currently reserved. A register is 
reserved when it is scheduled to be written by a previously issued but not yet completed in- 
struction. An interlock begins on the cycle in which an instruction that writes a register is is- 
sued, and remains in place until the instruction completes. The duration of an interlock (in cy- 
cles) is the same as the functional unit latency of the instruction destined to write the regis- 
ter. An instruction cannot issue until all the registers it reads and writes are free of 
interlocks. (The exceptions to this rule are explained below). 


The integer, floating-point multiply, and floating-point add functional units each have 
a fixed latency regardless of the type of operation the unit is performing. For example, all 
floating-point adds (both single and double precision) have a latency of six. Thus, any at- 
tempt to read or to write the destination register of a floating-point add will interlock until 
the add completes (the sixth cycle after the add was issued). 


The load/store functional unit has a fixed latency for most normal load/store instruc- 
tions that complete without data cache misses or traps. If there is a data cache miss, then 
the load/store functional unit pipeline is stopped until the miss is resolved (either by bringing 
the proper line into the data cache, or by trapping). Thus, load/store operations always com- 
plete in order. Data cache bank conflicts and store byte operations in some circumstances 


t For load and eload instructions that hit in the data cache and complete without trapping. 


¥ 20 for single precision, 25 for integer, and 35 for double precision operations. 
V 32 for single precision, 61 for double precision. 
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can also extend the latency of load/store operations. Some special load/store instructions al- 
ways have a longer latency. See the section on Load/Store Timing for more information. 


The floating-point divide/square root functional unit is the only functional unit that has 
different latencies for different operations. In fact, divide/square root instructions can finish 
out of order with respect to other divides/square roots. For example, a double precision for- 
mat floating-point divide could be started at cycle 0, and a single precision format floating- 
point divide could be started at cycle 10. The single precision divide would finish at cycle 29, 
five cycles before the double precision divide. 


Register interlocks from the integer functional unit are handled specially. The integer 
functional unit implements wrapping. Wrapping allows the integer functional unit to utilize 
prior results without having to wait for them to be written into the register file. Thus, a se- 
quence such as 


add %r1,%r2,%r4 
add %r3,%r4,%r5 


does not pay the usual penalty for functional unit latency — the second add can be issued on 
the cycle following the first, even though it uses a result produced by the first. In effect, there 
are no interlocks on the results of integer instructions when they are used as operands to the 
integer functional unit. An attempt by another functional unit to use a result from the integer 
functional unit will still result in an interlock. Wrapping does not apply to the results or to 
the operands of the call, rtrpd, wps, rps, spl, srm, rut, and wit instructions. Also, the srca 
operand of the dshfl and dshfr instructions cannot be wrapped. 


Another way in which the integer functional unit is treated specially is with regard to 
register “write” interlocks. A write interlock occurs when one instruction tries to write a 
register that is already scheduled to be written by an earlier instruction. In order to guaran- 
tee that the results will be written in the proper order, the second instruction is held up until 
the write of the first instruction completes. But, if the first instruction was issuing in the inte- 
ger functional unit, then it is guaranteed to complete its write before the second — the integer 
functional unit has a shorter latency than any other functional unit. While this optimization 
could be done for all functional units, it would then greatly impact the complexity of the wrap- 
ping control logic. Thus, if an integer instruction tries to write a register that is already 
scheduled to be written by the integer functional unit, no interlock is generated. 


C.5.2 Flag Interlocks 


No interlock is generated for reading or writing a flag in the integer functional unit 
(i.e., as an argument of a cmp, boof, addc, subb, or sel instruction). This is because the 
flags are maintained in the integer unit and thus there is no latency to access them within 
that unit. 


When flags are used for conditional execution (including branching), the latency of the 
integer functional unit must be taken into account. The use of a flag for conditional execution 
(or branching) will be delayed until the fifth cycle following the instruction that set the flag, 
as illustrated below. 
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cmpgt.l %r1,%r2,%f3 (0) 
<instruction.word> (1) 
<instruction word> (2) 
<instruction word> (3) 
<instruction word> (4) 
move <%f3> %r3,%r4 (5) 


C.5.3 11/10 Interlocks 


In the best case, two 32-bit instructions in a single instruction word can be issued in 
parallel. A number of 11/10 interlocks, however, may prevent this from happening. The 
causes of I1/I0 interlocks are enumerated and explained in the following subsections. I1/10 
interlocks always cause the I1 instruction to wait until the interlock condition is removed — 
they never delay the issuing of an IO instruction. 


C.5.3.1 I0 Interlocked 


Whenever the IO half of an instruction word is interlocked, the I1 instruction is also in- 
terlocked. This prevents the I0 and I1 instructions from issuing out of order. 


C.5.3.2 Serial Instructions 


Certain instructions are serialized by the hardware because they affect machine state 
in more than one functional unit. These serial instructions are halt, wps, and srm. Whenev- 
er either of IO or I1 is a serial instruction, they must issue in separate cycles. 


C.5.3.3 [1/10 Read Port Conflicts 


Since the functional units in the K-1 are only capable of accepting one operation per 
clock cycle, there will be an I1/I0 interlock whenever the IO and I1 instructions attempt to use 
the same functional unit. The actual cause of this interlock is slightly more complicated. 


The register file in the K-1 has five read ports which provide the srca and srcb oper- 
ands to the functional units. Each read port is connected to one or more functional units. If 
the IO and I1 instructions need to use the same read port, this prevents the I1 instruction 
from issuing until after the I0 instruction issues. The floating-point add, floating-point multi- 
ply and load/store units each have their own read port. The integer, floating-point di- 
vide/square root, and fetch units share a read port (but there are very few instructions issued 
to the fetch unit). Thus, for example, if both the IO and I1 instructions use the floating-point 
add functional unit, the I1 instruction will interlock until after the IO instruction issues. Or, if 
the I0 instruction uses the integer functional unit and the I1 instruction uses the floating- 
point divide/square root functional unit, the I1 instruction will interlock until after the IO in- 
struction issues. 


The lipage instruction is issued to the fetch unit on the same read port as the floating- 
point divide/square root and integer units, and thus cannot be issued at the same time as any 
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floating-point divide/square root or integer operation. Note that branches to an address in a 
register also issue to the fetch unit; these instructions cannot, however, interlock with float- 
ing-point divide/square root or integer instructions because all non-PC-relative branches are 
in 64-bit instruction formats. 


In addition to the normal functional unit read ports, there is a special read port that is 
used only for store and dshfl/dshfr instructions. This read port allows these instructions to 
read three registers instead of the usual two. Because of this, a store cannot issue at the 
same time as a dshfl or dshfr. 


C.5.3.4 I1/I0 Register and Flag Conflicts 


Since the K-1 preserves the semantics of a sequential machine, an I1 instruction that 
reads or writes a register (or a flag) that is written by the IO instruction in the same instruc- 
tion word will interlock. This interlock is a special case of the register and flag interlocks de- 
scribed above, and serves to prevent the I1 instruction from issuing until after the IO instruc- 
tion has issued (at which point, the issue of the I1 instruction will be blocked by a register or 
flag interlock, unless they are both integer unit instructions). 


C.5.4 Floating-Point Divide/Square Root Unit 


The floating-point divide/square root unit is capable of executing only three instruc- 
tions at one time — two divide instructions and one square root instruction. An attempt to is- 
sue a third divide before one of the two outstanding divides completes results in an inter- 
lock. An attempt to issue a second square root before the outstanding square root completes 
also results in an interlock. These interlocks last three cycles longer than the interlock on 
the result of the first completing divide (or square root). (I.e., the result of the floating-point 
divide/square root instruction is available three cycles before another such instruction can be 
issued.) Note that floating-point divide/square root unit instructions can complete out of or- 
der from each other, as, for example, when a single precision floating-point divide is issued 
the cycle after a double precision floating-point divide. 


Floating-point divide and square root instructions can complete in less time than 
specified in Table C-1. Invalid operands, for example, will be detected almost immediately 
by the floating-point divide/square root unit, and the instruction will complete and return a 
trap code without waiting for the usual latency to elapse. 


There is an additional interlock after a divide instruction (div.s, div.d, divsst, divssr) 
is issued — no divide or square root instructions can be issued on the two following cycles. 
Similarly, after a square root instruction (sqrt.s, sqrt.d), no divide or square root instructions 
can be issued on the following cycle. The hardware detects these cases, and will not issue 
any divide or square root instruction on the two cycles following the issue of a divide instruc- 
tion, or the cycle following the issue of a square root instruction. 


There are also additional delays when the results of divide or square root instructions 
are scheduled to complete at (or near) the same time. For example, a single precision divide 
issued five cycles after an integer divide would be scheduled to complete at the same time as 
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the integer divide. In cases such as this, one of the divides will complete a cycle later. The 
details of this are TBD. 


C.5.5 Load/Store Unit 


The load/store unit is capable of executing only eight instructions at one time. An at- 
tempt to issue a ninth load/store instruction before one of the eight outstanding ones com- 
pletes results in an interlock. (Note that load/store instructions always complete in order.) 
One cause of this interlock is that certain load/store instructions, such as wps, wait for all 
the load/store instructions in front of them to complete before they proceed (and thus cause 
the load/store queue to back up behind them). This interlock can also be caused by data 
cache misses, and other dynamically varying events. If, for example, a load misses in the da- 
ta cache, the load/store functional unit pipeline will stop until the data cache miss is satis- 
fied. The dynamic nature of some of the delays makes this interlock difficult to predict stati- 
cally. This may lead to program execution stopping for one of two reasons: an instruction 
tries to read or write the register that is being loaded, causing a register interlock, or, a ninth 
load/store instruction (the eighth after the “missing” instruction) tries to issue, causing a 
load/store unit interlock. (See the section on Load/Store Timing for more details.) 


C.6 Branching 


In addition to register and flag interlocks, branches can also experience additional de- 
lays. 


C.6.1 Branching to a PC-Relative or Absolute Address 


Branching per se incurs no absolute penalty measurable as a number of clock cycles. 
There are, however, dynamic penalties incurred by branching that are difficult to predict stati- 
cally. To understand these penalties, a brief explanation of the K-1 hardware is necessary. 


The instruction stache is a small cache that can be accessed very quickly. Branches 
that hit in the instruction stache incur absolutely no penalty. Branches that miss in the in- 
struction stache, however, may or may not incur a penalty depending on how backlogged the 
issue unit is, and whether the “missing” data is in the instruction cache or must be retrieved 
from main memory. 


Filling one line (64 bytes) of the instruction stache from the instruction cache takes 
only four additional cycles. This delay, however, does not necessarily slow down a running 
program. The fetch unit (which is responsible for retrieving instructions in the proper order) 
operates autonomously from the issue unit (which is responsible for issuing instructions in 
the proper order). Barring such events as instruction stache missing (and a few exceptional 
instructions such as branches to an address in a register, wps, lipage, etc.) the fetch unit will 
retrieve one instruction word from the instruction stache each clock cycle. Since four addi- 
tional cycles are required for an instruction stache miss, the desired instruction word is re- 
trieved at cycle i+4, instead of cycle i. 
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A queue of four instruction words, called the Instruction Queue, lies between the 
fetch and issue units. Normally, the fetch unit will be ahead of the issue unit (because of in- 
terlocks which slow down the issue process). While the fetch unit is handling an instruction 
stache miss, the issue unit can continue executing out of the Instruction Queue. Thus, a 
branch that misses in the instruction stache will only incur a cycle time penalty if the issue 
unit is able to empty the Instruction Queue before the fetch unit can retrieve the “missing” 
data. 


Filling the instruction cache from main memory takes roughly 60 cycles, but this time 
is variable due to memory interference from other processors, different RAM speeds in the 
memory subsystem, etc. In this case, it is almost certain that the Instruction Queue will 
empty and some number of cycles will be lost. The large size of the instruction cache (1 
MB), however, makes an instruction cache miss a rare event. 


C.6.2 Branching to an Address in a Register 


In addition to the general branch penalty described above, branching to an address in 
a register incurs an additional three cycle penalty in order to retrieve the contents of the reg- 
ister and affect the Program Counter. A further potential penalty comes from the fact that 
when a branch to an address in a register issues, there will be at most one other instruction 
word in the Instruction Queue. 


C.7 Special Instructions 


A number of instructions in the K-1 have special timing rules/constraints. These in- 
structions are described below. Other than load/store instructions, which are described in 
the section on Load/Store Timing, any instruction not described below can be assumed to 
issue in a single cycle and to return its result (if any) after the latency of the functional unit in 
which it executes. Note that this section only considers instructions that have special timing 
rules and does not consider interlock issues (discussed in the section on Interlocks), or 
branch instructions (discussed in the section on Branching). 


C.7.1 nop 


nop instructions that are conditionally executed experience the same flag interlock as 
any other instruction. A 32-bit (register or short constant format) nop in the same instruc- 
tion word as another instruction can be issued in parallel with any other instruction. Barring 
flag interlocks, two 32-bit nop instructions in the same instruction word will issue in the 
same cycle. 


nop instructions in long constant format are treated specially. See the nop instruc- 
tion description for details. 
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C.7.2 rps, wps, spl, and srm 


The wps and srm instructions cannot be issued in parallel with any other instruction 
since they are serial instructions. In addition, no other instructions will issue until the fifth 
cycle following the issue of a register or short constant format IO wps or the sixth cycle fol- 
lowing the issue of a long constant format I0 or an I1 wps. (This additional penalty, which 
accounts for the time required to update the flags in the fetch and issue units, does not apply 
to srm or spl). 


The rps instruction will not issue until all the arithmetic (integer, floating-point add, 
floating-point multiply, and floating-point divide/square root) functional unit pipelines have 
emptied. This allows the correct Arithmetic Exception Flags<4..0> field of the Processor 
Status register to be read. The timing of rps is such that it can be issued one cycle before 
the results of all the outstanding floating-point or integer instructions are available. (For in- 
structions that do not return results, as for some integer instructions such as wit, rps will 
wait for these instructions as if they returned a result that the rps read.) Furthermore, rps 
does not have to wait any additional time for compare instructions (or other instructions that 
modify the flags). The integer unit is responsible for maintaining the flags and for returning 
the flag portion of the Processor Status register, so that any outstanding flag-modifying in- 
structions will complete before the rps. To clarify these delays, a rps cannot be issued until 
the second cycle following the issue of an integer instruction, the fifth cycle following the is- 
sue of a floating-point add instruction, or the seventh cycle following the issue of a floating- 
point multiply instruction. That is, the rps waits one cycle less than it would if it were wait- 
ing for the result of the previously issued instruction(s). 


The spl instruction is intended to be used only by the operating system in very special 
circumstances. Though it modifies the Processor Status register, it incurs no additional ex- 
ecution time penalties. But, the effect of the spl is delayed until the fifth cycle following its is- 
sue. If, for example, the priority level was being changed from 15 (the lowest priority level) 
to 2, it would be necessary to wait four cycles after issuing the spl to guarantee that no inter- 
rupts lower in priority than level 2 were about to occur. Note, however, that waiting only 
three cycles after issue of the spl is enough to guarantee that critical code is not interrupted 
after it has begun. The fourth cycle after the spl could receive an interrupt at the old level, 
but the instruction would be interrupted before it was issued, so that the critical section 
could be restarted. This three cycle wait can be accomplished with a long constant format 
nop with an operand of 2 immediately after the sp]. Note that when changing from a high pri- 
ority level to a lower one, no wait is needed since lower priority level interrupts are already 
blocked. 


C.7.3 Trap Instructions and Instructions Which Trap 


The trap instructions (bpt, trap, strap, and xtrap) all cause execution of the current 
process to be suspended after their issue. It does, however, require some amount of time to 
begin executing instructions in Trap State. The bulk of this penalty will be due to the time 
required to start fetching the instructions of the trap handler (which may involve an instruc- 
tion cache miss). If the trap handler is resident in the instruction cache, however, there is 
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just a 12 cycle penalty to enter and start executing in Trap State. That is, if the trap instruc- 
tion is issued at time i, the first instruction in Trap State will be issued at time i+/3. 


When instructions that have been issued to the functional units trap, the trap will sus- 
pend further issuing on the second cycle following the cycle when the result of the instruction 
could have been used. If instruction issue is suspended on cycle i by a functional unit trap, 
then the first instruction in Trap State will be issued at time i+/2. 


The time required to enter Trap State will be slightly longer when the functional unit 
pipelines are full when the trap is detected. The integer, floating-point add, and floating- 
point multiply pipelines will drain completely before Trap State is entered (and thus do not 
delay entering Trap State), but the load/store and floating-point divide/square root pipelines 
may not empty before Trap State is entered (due to their longer latencies). The issue unit 
will not issue the first Trap State instruction until the fourth cycle after all the pipelines 
have emptied. 


C.7.4 exts 


The exts instruction has the same timing as a branch to register combined with a 
load.l. Recall that three exts instructions in a row are required to exit Trap State; this is 
the only interesting timing case for exts, since any other use of it will produce undefined re- 
sults. In the optimistic case where all three exts instructions hit in the instruction stache, 
and the first instruction returned to hits in the instruction cache, the first return instruction 
will issue on the tenth cycle following the issuing of the first exts instruction. (Whenever 
Trap State is entered or exited, the instruction stache is killed. Thus, there is always an in- 
struction stache miss penalty associated with entering or exiting Trap State). Note that 
the first return instruction could interlock on the results of one of the exts instructions, be- 
cause it could use as an operand (or a result) one of the registers being reloaded by the exts 
instructions. Also, because of the penalties associated with branching to an address in a 
register, the third exts instruction will not issue until the fifth cycle following the first. 


C.7.5 ickill 


The ickill instruction has the same timing as a branch to an address in a register, but 
incurs an additional three cycle instruction fetch penalty. Note that the destination of an ick- 
ill instruction will never miss in the instruction stache or instruction cache. Timing-wise, the 
ickill instruction acts like an instruction stache miss that takes only three cycles instead of 
the usual four. 


C.7.6 iskill 


The iskill instruction always causes the third following instruction word to miss in the 
instruction stache. Other than the stache miss penalty, it has no timing effects. 
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C.8 Load/Store Timing 


The load/store unit has very complicated timing due to the banked nature of the data 
cache, the delayed nature of store instructions, the variable delays of the memory sub- 
system, and the unpredictable appearance of cache coherency requests. 


C.8.1 load instructions 


All load instructions that hit in the data cache and do not have bank conflicts have a 
latency of nine cycles. This includes eload, Idecc, Idnecc, etc. 


A bank conflict results when two addresses that agree in bits five through three are 
used twice within three cycles. The second occurrence of such an address will wait until the 
third cycle following the first occurrence before it begins executing in the load/store unit. 
Note that the bank conflict is detected in the load/store unit as it is about to execute instruc- 
tions. Due to data cache misses and other dynamically varying events in the load/store unit, 
it is possible that two load/store instructions that were issued more than three cycles apart 
will end up trying to execute on successive cycles in the load/store unit. 

load instructions that miss in the data cache are subject to variable memory sub- 
system delays. Assuming that the read request for the load was handled immediately, the 
load instruction would have a latency of roughly 55 cycles. The load/store unit, however, re- 
mains busy (and will not start executing another instruction) for an additional 12 cycles after 
returning the result of the load instruction to the register file. (The additional 12 cycles is 
the time required to write the data returned from memory into the data cache). 


There is no additional time penalty for data cache misses if the data present in the 
cache is modified and has to be written back to memory. In general, this time is overlapped 
with the memory access. 


C.8.2 store instructions 
In addition to the timing irregularities associated with load instructions (bank con- 


flicts and data cache misses), store instructions have additional timing complications due to 
their use of Delayed Write Buffers (DWBs). 


This section needs to be expanded, but it’s real complicated. 


C.8.3 Idpage 


The Idpage instruction requires all load/store instructions in front of it to complete, 
and then prevents any other load/store instructions from executing for two additional cycles. 


C.8.4 pcl and zcl 


The pcl instruction will always issue in one clock cycle. The hardware, however, has 
the option of ignoring the pel, or of cancelling it after it is started. (This cannot affect the cor- 
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rectness of programs, since pcl is simply an optimization). If the pel is completed, it can re- 
duce the delay on a data cache miss to under 12 cycles, with an additional 12 cycles during 
which no other load/store instructions can be started. (This additional time is required to 
write all the data retrieved from the memory subsystem into the data cache.) 


The zcl instruction must write four sub-lines into the data cache, and requires a mini- 
mum of 12 extra cycles to do this. zcl must also wait for any instructions in front of it to com- 
plete. 


C.8.5 dflush 
A dflush that does not write data back to memory will take TBD cycles. If a memory 
write is involved, the data cache will be tied up for TBD cycles, thus preventing other 


load/store operations from executing, but not necessarily from issuing (which would have 
stopped the entire K-1). 


C.8.6 rfec and wfec 


rfec and wfec execute locally in the K-1 processor. Both these instructions wait for 
all load/store instruction ahead of them to complete before they issue. 


C.8.7 rios and wios 


The rios and wios instructions are executed with the cooperation of the I/O system. 
Their latency is discussed more fully in Appendix F. 
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Many aspects of trap handling are implementation-dependent. This appendix de- 
scribes the details of trap handling in the Version 1 implementation of the K-1 architecture. 
It contains specifics on the data preserved by the CPU on a trap, how to read it out, and how 
to restart after a trap. It concludes with an example of a K-1 assembly language program 
that is intended to be a model for the implementation of trap handlers. 


D.1 Trap Data and Trap Recovery 


Two major groups of trap data stored by the K-1 aid in recovery from a trap. The pri- 
mary trap data, which can be read into registers with a sequence of rtrpd instructions, con- 
tains the Trap Summary, Trap Locators, and Restart PCs. The load/store trap data, which 
can be stored to memory with a sequence of slstrpd instructions, contains information used 
to recover from load/store traps. It is important to remember that load/store traps can occur 
in conjunction with other types of traps and/or interrupts. The following sections describe the 
two types of trap data. 


D.1.1 Primary Trap Data 


The primary trap data consists of the Trap Summary, the Trap Locators, and the Re- 
start PCs. The Restart PCs, shown in Figure 2-12, are described in Chapter 2. The Trap Lo- 
cators for the integer, floating-point multiply, and floating-point add units collectively are 
known as the IMA Trap Locators. The load/store and floating-point divide/square root 
units have Trap Locators called the L/S Trap Locators and the Divide Trap Locators, re- 
spectively. 


D.1.1.1 Trap Summary 


Since many traps (and interrupts) can occur simultaneously, the Trap Summary, 
shown in Figure D-1, indicates which trap types occurred in the current trap. The Trap 
Summary is divided into a Trap Report half (in the low-order 32 bits) and a Trap Status half 
(in the high-order 32 bits). The Trap Report contains all information about the occurrence of 
traps, interrupts, and machine checks. The Trap Status contains additional information that 
makes trap processing more efficient. 


Five types of traps can occur: DT indicates a decode trap, IFT indicates an instruc- 
tion fetch trap, IT indicates an integer overflow or check trap, FT indicates a floating-point 
trap including an invalid operation from a floating-point compare, and LST indicates a 
load/store trap. DT, IFT, and LST are decoded according to Table D-1. 
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Figure D-1. Trap Summary 


Machine checks, in broad terms, are anomalous conditions that should never occur in 
a correctly functioning system. They are indicative of component, connector, or design fail- 
ures, possibly of an intermittent or one-time nature. These errors’ sporadic (and unexpect- 
ed) nature makes them extraordinarily hard to diagnose and repair. The K-1 hardware de- 
tects a number of the most likely of these errors and logs them as part of the Trap Summary 
so that the operator can be notified of the problem. There are five classes of machine checks: 
ICPE indicates an instruction cache parity error, EXTE indicates an external error, IPE indi- 
cates an internal pipeline error, IFUE indicates a functional unit error, and TCMC indicates a 
trap-class machine check (an error detected along with a normal trap). EXTE, IPE, IFUE, 
and TCMC are decoded according to Table D-2. 


The external interrupts, NMI, CKI, RPI, WPI, and IOI, are discussed in the subsec- 
tion on The Trap Sequence in Chapter 2. Table 2-9 lists the priority level corresponding to 
each external interrupt. 


There are three fields in the Trap Status half of the Trap Summary: DIVX, IMAX, 
and LVL. If the floating-point divide/square root unit trapped on a divide (floating-point or 
integer) Or square root instruction, the DIVX field indexes the floating-point divide/square 
root engine that detected the first trap. The floating-point divide/square root unit contains 
two divide engines and one square root engine, each of which can be busy with a single in- 
struction (of the appropriate type) at any given time. If no floating-point divide/square root 
instruction trapped, DIVX is guaranteed to be zero. See Appendix C for more information 
about the floating-point divide/square root functional unit. Similarly, the IMAX field provides 
an index into the IMA Trap Locators to help identify the first trapping integer, floating- 
point add, or floating-point multiply instruction. If no such trap was detected, IMAX will be 
zero. If a trap was detected, IMAX will take on a value between 1 and 10 which corresponds 
to the appropriate IMA Trap Locator entry. See sub-section D.1.1.2.1 for more information 
on the IMA Trap Locators. The LVL field records status information for I/O system inter- 
rupts and contains two subfields: LVL<4>, which indicates that a parity error was detected 
on the wires over which the I/O system signals interrupts to the CPU, and LVL<3..0>, 
which indicates the level of the highest priority (pending) interrupt. If IOI is not asserted, 
the LVL field is undefined. If LVL<4> is asserted, LVL<3..0> is undefined. 
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Trap Summary 


Table D-1. Trap Encodings 


Hex Value 


0 
1 
2 
| 
4 
5 
6 
7 
0 
1 
a 
3 
4 
5 
6 
v/ 


QAWPFWDOMAIDANaRWNHO 


Y 
| 


t Recoverable trap 


= Non-recoverable trap 
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No trap 

Unused 

Illegal instruction/privilege violation 
Trace trap 

bpt executed 

trap executed 

strap executed 

xtrap executed 


No trap 

Nonexistent memory error 
Uncorrectable ECC error 
Memory-system related parity errors 
Instruction cache parity error 
Instruction cache tag parity error 
Instruction map miss 

Unused 


No trap 

Data map miss 

Illegal address (misaligned) 

Illegal access (protection violation) 
echk trap 

Data watchpoint trap + 
Page Map/Cache Tag parity error (R) 
Page Map/Cache Tag parity error (NR) 
Unused 

ECC error on first cache sub-line 

ECC error on subsequent cache sub-lines 
Memory-system related parity errors 
Nonexistent memory error 

Unused 


D-3 
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Table D-2. Machine Check Encoding 


No external machine check 

Memory read/data cache parity error 
I/O parity error 

Memory Interface Controller fatal error 


No internal pipeline machine check 

Trap State PC when not in Trap State 

USED or ISSUE from DC with PC Pipe empty 
Valid PC with PC Pipe full 


No internal functional unit machine check 
Invalid exception from square root engine 
FMULT/FADD reports divide by zero exception 
Simultaneous IDIV and FDIV exceptions 
Load/Store DONE with L/S QUEUE empty 
Load/Store ISSUE with L/S QUEUE full 

Divide inconsistency 

Unused 


0 
1 
2 
3 
0 
1 
2 
3 
0 
1 
2 
3 
4 
5 
6 
7 


Restart PC error 


No trap-class machine check 

INT Trap without exception code 
FADD Trap without exception code 
FMULT Trap without exception code 
FDIV Trap without exception code 
FDIV Trap without FDIV DONE 

INT Trap without instruction issue 
FADD Trap without instruction issue 
FMULT Trap without instruction issue 
Second Load/Store Trap 

Load/Store Trap with L/S QUEUE empty 
Unused 


FOMDYAAUWNWAWNKHKO 


. 
7 


TCMC<3..0> 
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D.1.1.2 Trap Locators 


There are three types of Trap Locators. The floating-point divide and load/store units 
have Trap Locators called the Divide Trap Locators (Figure D-2) and the L/S Trap Loca- 
tors (Figure D-3). The Trap Locators for the integer, floating-point multiply, and floating- 
point add units collectively are known as the IMA Trap Locators (Figure D-4). Associat- 
ed with the IMA Trap Locators are the IMA Trap Bits. 


2 
0 
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3 $7 6421 9 
Zero ENG|FDT] I | Unused] Instruction Trap PC 
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Figure D-3. Load/Store Trap Locator 
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Figure D-4. IMA Trap Locator 


Each Trap Locator contains a Trap PC and Instruction Page ID, I0 and I1 issue 
bits, a valid bit V, and zero or more trap type fields. If the V bit is not set, the Trap Locator 
should be ignored. The Trap PC and Instruction Page ID combine to indicate the address 
of the instruction word containing the offending instruction (i.e., the location of the trap). The 
Trap PC is simply the low-order address bits of the PC that can be mapped by a single in- 
struction page table entry. The Instruction Page ID is a 3-bit field which represents one of 
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the eight instruction page table entries and uniquely identifies the high-order bits of the PC. 
(See the subsection on Instruction Mapping in Chapter 2). Each Divide Trap Locator has 
an additional 3-bit field, ENG, which indicates the divide/square root engine that executed 
the trapping instruction. Each IMA Trap Locator has an additional one-bit field, BR, that 
indicates whether a branch (i.e., br, call, jump, exts, or ickill) was executed by the instruc- 
tion at the corresponding PC. Unused bits are provided for architectural expansion. 


The IMA Trap Bits are a collection of the trap type fields associated with the IMA 
Trap Locators, separated for implementation reasons. Each Trap Locator has a correspond- 
ing IMA Trap Byte (Figure D-5). The IMA Trap Bytes are grouped into a number of IMA 
Trap Words (Figure D-6), each of which holds five IMA Trap Bytes. 


Figure D-6. IMA Trap Word 


The FDT, FMT, FAT, and IT fields indicate the trap type(s) associated with the 
particular Trap PC/Instruction Page ID according to the decoding in Table D-3. Note that 
more than one trap may be associated with a Trap PC, hence more than one trap type field 
may be non-zero. There is no field indicating the type of load/store trap on a per Trap Locator 
basis because the first load/store trap that occurs freezes the load/store queue. Since only 
one load/store trap can occur, its type is reported in the Trap Summary. 


The 10 and II fields in each type of Trap Locator help to determine which instruction 
in the instruction word is responsible for the trap. Table D-4 shows how to decode these 
fields when the instruction word at the Trap PC contains two 32-bit instructions. Table D-5 
gives the decoding when one 64-bit instruction is at the Trap PC. In the case where there 
are two 32-bit instructions and both issue bits are set, the determination of which instruc- 
tion(s) trapped is made by consulting the trap type fields and the opcodes of the two instruc- 
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tions. For example, if both floating-point add and floating-point multiply traps are reported 
in the same Trap Locator, then one of the opcodes of the instruction word addressed by the 
Trap PC must be a floating-point add and the other must be a floating-point multiply. If, for 
example, only an integer trap is reported, then the trapping instruction is whichever of the I0 
and I1 instructions is an integer unit instruction. (Consult Appendix C for an explanation of 
why, when both issue bits are set, it cannot be the case that both the I0 and I1 instructions 
were executed by the same functional unit.) 


Table D-3. Trap Bit Field Encodings 


No trap 
Inexact 
Overflow 
Overflow and inexact 

Underflow 

Underflow and inexact 

Invalid operation (except floating-point compare) 
Divide by zero 


No trap 
Integer overflow T(not disabled by ATE bit in PS) 
Invalid operation (floating-point compare) 

Check trap (chk instruction) 


+ Due to addt or subt instructions detecting overflow. 


Table D-4. 10 and I1 Decoding for 32-bit Instructions 


Trapping 
Instruction 


not possible 


Il 
10 
10 and/or nt 


T If the trap type is DT then only I1 
could have caused the trap 
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Table D-5. 10 and I1 Decoding for a 64-bit Instruction 


Trapping 
Il , 
Instruction 


none 
not possible 
10 


not possible 


From Tables D-4 and D-5, it is apparent that the execution of an I1 instruction can- 
not be distinguished from the execution of a 64-bit IO instruction, except by looking at the for- 
mat control bits of the IO instruction. For this reason, programmers should not put garbage in 
the unused IO half of an instruction word. (In fact, good programming practice dictates that 
the IO half of an instruction word should always contain a valid instruction.) 


D.1.1.2.1 IMA Trap Locators 


The IMA Trap Locators and IMA Trap Bits cooperate to maintain the PC, trap 
type, and issue bits of the last ten non-Trap State instruction issue cycles. An instruction 
issue cycle is defined as a single clock cycle in which one or two instructions are issued. 
IMA Trap Locator 1 stores information for the oldest such instruction(s) issued (if any), 
IMA Trap Locator 10 stores information for the youngest. The IMA Trap Bits are divided in- 
to two IMA Trap Words, each of which stores the trap type fields for five IMA Trap Loca- 
tors. Each Trap Word maintains five IMA Trap Bytes labelled 1 through 5. IMA Trap Word 
0 maintains the trap type information for IMA Trap Locators 1 through 5 in Trap Bytes 1 
through 5, respectively. IMA Trap Word 1 maintains the trap type information for IMA Trap 
Locators 6 through 10 in Trap Bytes 1 through 5, respectively. These ten Trap Bytes are col- 
lectively referred to as IMA Trap Bytes 1 through 10, and are numbered identically to the 
IMA Trap Locators. 


The Trap Locator in which the first trapping integer, floating-point multiply, and float- 
ing-point add unit instruction will be reported is determined by the number of instruction is- 
sue cycles that occurred after the trapping instruction is issued and until instruction issuing 
is suspended (due to the trap). If nine instruction issue cycles (the maximum) occurred, IMA 
Trap Locator 1 and IMA Trap Byte 1 hold the PC, trap type, and issue bits for the trapping in- 
struction. If no instruction issue cycles occurred after the trapping instruction was issued 
(and before the trap was detected), IMA Trap Locator 10 and IMA Trap Byte 10 hold the 
PC, trap type, and issue bits for that instruction. At least one of the three fields: IT, FAT, or 
FMT will be nonzero in this Trap Byte. The IT, FAT, and FMT fields of all IMA Trap 
Bytes corresponding to instructions issued before the first trapping instruction will be zero. 
The IT, FAT, and FMT fields for instructions issued after the first trapping instruction de- 
pend on whether the corresponding instructions also trapped. Since one or two instructions 
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can be issued in a single clock cycle, zero, one or two of the IT, FAT, and FMT fields will 
be non-zero, indicating which instruction(s) trapped. The ninth clock cycle following the is- 
sue of the trapping instruction is the last cycle in which an instruction could have been issued 
prior to the reporting of traps (and the consequent suspension of instruction issuing). 


The IMAX field in the Trap Summary indicates the number of the Trap Locator/Trap 
Byte containing the PC, trap type, and issue bits for the first issued trapping integer, floating- 
point add, or floating-point multiply instruction. A maximum of 4, 7, or 10 instruction issue cy- 
cles can occur after the issue of a trapping integer, floating-point add, or floating-point multi- 
ply instruction, respectively. 

Once the K-1 exits Trap State, all ten IMA Trap Locators are invalidated. Subse- 
quently, if another trap occurs and there have been less than ten instruction issue cycles 
since exiting Trap State, not all IMA Trap Locators will contain valid information. Invalid 
IMA Trap Locators will always have their V bit clear. 


The IO and [1 fields in the IMA Trap Locators serve to identify which instruction(s) 
(10, 11, or both) were issued. 


For example, in the following program: 


XX: move %r24,%rl; move %125,%r2 
move %r26,%r4; move %or27 ,Yor5 
add.d %r1,%r2,%r13 ; add.d %r4,%or5,%or6 
mult.d %r3,%r7 ,%r8 ; add.d %r9,%r10,%rl11 
addt %r12,%rl13,%rl4; addt %r15,%rl6,%rl7 
move %r20,%r121 ; nop 


if we assume that all six add.d, mult.d, and addt instructions generate overflow traps, then 
the IMA Trap Locators/Trap Bytes will be as follows: 


Locator 


AT | FMT 


Oorroo°onco°nceoo Ei 


eevee 
coounnocce| F 
cocnococco| 2 


eK OF KF Or OK OC 


KH OF KR OK OK Ore 


Note: IMAX = 5 
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D.1.1.2.2 L/S Trap Locators 


The eight L/S Trap Locators correspond to the maximum of eight possible 
load/store instructions that could have been issued but not completed at the time of a 
load/store trap. All of these instructions will have been stopped before altering either gener- 
al registers or memory. 


If there is no load/store trap reported in the Trap Summary, all L/S Trap Locators 
should be ignored (because their V bits will be off). If there is a load/store trap reported in 
the Trap Summary, then L/S Trap Locator 1 will give the Trap PC/Instruction Page ID and 
issue bits of the trapping instruction. L/S Trap Locators 2 through 8 will give the same infor- 
mation for the following load/store instructions that were issued (up to seven). Note that no 
information is delivered about whether or not these instructions would have trapped had they 
been allowed to complete. Part of the software simulation required in a trap handler is to de- 
termine if these instructions would have trapped, and if so, to simulate the appropriate 
load/store traps. If fewer than eight load/store instructions were in the load/store queue at 
the time of the load/store trap, then fewer than eight L/S Trap Locators will be valid. 


D.1.1.2.3 Divide Trap Locators 


The three Divide Trap Locators correspond to the two possible divide and one pos- 
sible square root instructions that could be outstanding at the same time. Any number from 
zero to three simultaneous divide/square root traps are possible. A Divide Trap Locator, if 
valid, gives the location, trap type, issue bits, and engine number for one divide/square root 
instruction. 


Divide Trap Locator 1, if valid, corresponds to the oldest trapping divide/square root 
instruction, Divide Trap Locator 2 to the next oldest, and Divide Trap Locator 3 to the young- 
est. One should note that the number of valid Divide Trap Locators will be equal to the num- 
ber of trapping divide/square root instructions, something which is not true for either the IMA 
Trap Locators or the L/S Trap Locators. In addition, because of the variable latencies of the 
divide and square root instructions, the first divide instruction to trap may not be the oldest 
outstanding divide instruction. 


D.1.2 Reading Primary Trap Data 


Primary Trap Data is read one word at a time, using a sequence of rtrpd instructions. 
The srca operand of rtrpd controls which type of primary trap data is read as shown in Table 
D-6. Use of values for srca not shown in this table will lead to unpredictable results. With 
the exception of the Trap Summary and the IMA Trap Words, which can be read any num- 
ber of times while in Trap State, Primary Trap Data can only be read once while in Trap 
State (i.e., the read operation is destructive) and must be read out in a pre-defined, type-de- 
pendent order. 


The Restart PCs can only be read once while in Trap State, and they are read in the 
following order: First Restart PC, Second Restart PC, and Third Restart PC. 
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Table D-6. rtrpd srca encoding 


Trap Summary 
Restart PCs 
IMA Trap Word 0 


IMA Trap Word 1 
IMA Trap Locators 
L/S Trap Locators 
Divide Trap Locators 
Unpredictable results 


The L/S Trap Locators can only be read once while in Trap State. The first entry 
read is L/S Trap Locator 1, which corresponds to the first (if any) trapping L/S instruction. 
Subsequent entries are read sequentially with L/S Trap Locator 8 read last. 


The Divide Trap Locators also can only be read once while in Trap State. The first 
entry read is Divide Trap Locator 1; subsequent entries are read sequentially with Divide 
Trap Locator 3 being read last. 


Similarly, the IMA Trap Locators can only be read once while in Trap State. The 
first entry read is IMA Trap Locator 1; subsequent entries are read sequentially with IMA 
Trap Locator 10 read last. IMA Trap Words 0 and 1, as indicated in Table D-6, are accessed 
separately using different srca arguments. As indicated earlier, the IMA Trap Words can be 
read any number of times while in Trap State. 


D.1.3 Load/Store Trap Data 


In the event of a trap, the K-1 attempts to empty all functional unit pipelines before 
starting the trap handler. This happens automatically for arithmetic pipelines by waiting 
enough cycles for any outstanding operations to complete. However, since load/store in- 
structions must complete in order, if one load/store instruction traps, no load/store instruc- 
tions after that can be allowed to finish. Therefore, the load/store pipeline may not be empty 
if there is a load/store trap. 


There can be as many as eight load/store instructions in progress when a load/store 
trap is detected. The load/store trap data contains enough information about all of the in- 
structions in the load/store pipeline, including the one that trapped (which will always be at 
the head of the load/store queue), to allow a software routine to recover from the trap by sim- 
ulating these already-issued, but not executed, instructions. The type of load/store trap, if 
any, is contained in the Trap Summary. If there was no load/store trap, then the information 
contained in the load/store trap data will not be valid and should be ignored. 


Note that it would not be correct to restart the program at the PC of the trapping 
load/store instruction (assuming that the cause of the trap had been rectified). Any non- 
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load/store instruction that appeared in the program between the trapping load/store instruc- 
tion and the first Restart PC will have already completed before the trap. Restarting the pro- 
gram in this fashion would cause such non-load/store instructions to be re-executed, produc- 
ing invalid results. For example, an integer instruction issued shortly after a trapping 
load/store instruction will complete before the load/store trap is detected. Execution could 
not be resumed at the location of the trapping load/store instruction without re-executing the 
integer instruction. This is why the instructions in the load/store pipeline at the time of a 
load/store trap must be software simulated. 


For each of the eight possible load/store instructions in the load/store pipeline at the 
time of a load/store trap, certain trap data must be saved. This data, called Load/Store 
Trap Data, usually contains the address generated by the load/store instruction and other 
control information including the opcode and the destination register (if any). The slstrpd in- 
struction is used to write this data to memory. For certain instructions, the address con- 
tained in Load/Store Trap Data is replaced by other information. The format of Load/Store 
Trap Data is shown in Figure D-7. 


Figure D-7. Load/Store Trap Data 


In Figure D-7, OP corresponds to the field of the same name in the instruction for- 
mats. It is possible that there were fewer than eight instructions in the load/store pipeline at 
the time of the trap. This situation is indicated by delivering zero in the OP field for those en- 
tries that correspond to the unfilled load/store queue slots. The unused entries will always 
be in a single group at the end of the Load/Store Trap Data. 


For most load/store instructions, the Data field is interpreted in addr format as 
shown in Figure D-8. RC corresponds to the field of the same name in the instruction for- 
mats and will only be valid if the instruction uses that field. Bits 31..18 of the Data field are 
undefined, and the concatenation of bits 49..32 and 17..0 is the virtual address, Addr, refer- 
enced by the instruction. This interpretation of the Data field holds for all load, eload, load- 
cpu, store, storecpu, Idecc, Idnecc, zcl, pcl, swat, wps, echk, relf, rfec, and wfec instruc- 
tions. For instructions such as rfec and wfec, which do not compute an address, the Addr 
field is identical to srca of the instruction. 


For each store or swat instruction in the load/store pipeline at the time of a 
load/store trap, the K-1 remembers the data that was to be stored by that instruction. This 
data is kept in a sequential list, each entry of which (called Load/Store Trap Store Data) 
may be written directly to memory with the slstrpd instruction. The first entry in the list cor- 
responds to the first store or swat instruction, the second entry corresponds to the second 
store or swat instruction, etc. 
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Addr<35..18> Undefined Addr<17..0> 


(18) (14) (18) 


Figure D-8. addr Data Field Format 


Note that for stores of less than 8-byte precision (i.e., byte, halfword, and word), only 
the least significant part of the Load/Store Trap Store Data for that instruction contains 
meaningful data. In other words, the data saved is not identical to the contents of srcc of the 
store instruction, except for 8-byte stores and swat. 


For all welf, wdwp, dflush, rios, and wios instructions, the Data field is interpreted 
in combined srca/srcb format as shown in Figure D-9. 


srea<35....18> Undefined sxrcb srcea<17..0> 
<6..0> 


Ci) 


(18) (7) (18) 


Figure D-9. Combined srca/srcb Data Field Format 


In this format, RC has the same definition as in the addr format. The low-order 36 bits of the 
instruction’s srca operand are packed into bits 49..32 and 17..0. The low-order 7 bits of its 
srcb operand reside in bits 24..18. srcb<6> is only defined for rios and wios. Bits 31..25 are 
undefined. These 36 bits of srca and 7 bits of srcb are provided regardless of the presence or 
size of the instructions’ operands. 


The Data field for the Idpage instruction is interpreted in Idpage format as shown in 
Figure D-10. 


Defined 
(11) 


Figure D-10. ldpage Data Field Format 
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In this format, Virtual Page Number (or VPN) corresponds to the field of the same name in 
Figure 2-8 (the srca operand to the Idpage instruction) and resides in bits 49..32 and 17..16 
as VPN<35..18> and VPN<17..16>, respectively. PPN<35..32> and PPN<31..16>, when 
concatenated, compose the Physical Page Number field. The Physical Page Number, R, 
W, and S fields correspond to fields of the same names in Figure 2-9, the srcb operand to the 
Idpage instruction. 


D.1.4 Storing Load/Store Trap Data 


Load/Store Trap Data is stored, one 64-bit word at a time, using a sequence of 
slstrpd instructions. Load/Store Trap Data can be stored any number of times while in Trap 
State. The srca operand of slstrpd specifies where in memory to write Load/Store Trap 
Data. The srcb operand selects which Load/Store Trap Data to store as shown in Figure 
D-11. 


INS |Unused]| TY 


(3) (1). C2) 


Figure D-11. slstrpd srcb Operand Format 


The TY field controls which unit (Load/Store Trap Data, Load/Store Trap Store 
Data, or Load/Store PS) of Load/Store Trap Data to store, and the INS field selects the 
instruction in the load/store pipeline whose data is stored. The encoding of the TY field is 
given in Table D-7. For Load/Store Trap Data, the INS field has a simple binary encoding, 
with zero referring to the oldest instruction in the load/store pipeline, and seven referring to 
the youngest. 


Table D-7. TY Encoding 


Unit of Load/Store Trap Data 


Illegal 
Load/Store Trap Data 

Load/Store Trap Store Data 
Load/Store PS 


When the TY field selects Load/Store Trap Store Data, the INS field specifies which 
store or swat instruction’s store data should be written to memory. A value of zero indi- 
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cates the first store or swat in the load/store queue, a value of one indicates the second, 
etc. For example, if only the third and fifth instructions in the load/store pipeline at the time 
of a trap were store or swat instructions, then only the first two Load/Store Trap Store Data 
words (INS = 0 and INS = 1) would be meaningful. 


If the TY field is 3, then the INS field is ignored and slstrpd returns certain bits of the 
Processor Status register that were in effect at the time of the load/store trap. Since one or 
more wps instructions may have been in the load/store pipeline, these bits may not be the 
same as the current Processor Status. The trap recovery routine should use this 
load/store PS while simulating instructions up to the first wps instruction encountered in 
the load/store pipeline (if any). Upon encountering each wps instruction, the trap recovery 
routine should use the new PS (from the wps srca operand) for further simulation. The for- 
mat of the load/store PS is the same as that returned by an ordinary rps instruction, except 
that only the following fields are defined: 


Byte Order Low-to-High 

Process Key<12..0> 

User Protection 

User Mode Store 

User Mode Load 

Small Address Compatibility Mode 
Early Load Alignment Trap 


D.2 Use of exts Instructions 


The exts instruction is used to reload the Restart PCs and return from a trap. exts 
must be used in a very stylized manner to correctly restart the trapping program and exit 
from Trap State. Once the first exts is executed, two more exts’s must be immediately ex- 
ecuted with no intervening instructions (enabled or disabled). Three sequential exts instruc- 
tions are necessary to reload the three Restart PCs. Before the instruction at the address of 
the first Restart PC is executed, the processor will have exited Trap State. Since exts is 
restarting a trapping program, it both provides a way to return to the program, and to reload 
the registers needed to hold the Restart PCs. See the exts instruction description for more 
details. 


D.3 Example Trap Handler 


# include "~chris/notes/trap_state.s 
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E.1 Introduction 


The K-1 memory system is designed for very high memory bandwidths so that it can 
support a large number of high-speed processors and I/O Controllers. To obtain sufficient 
bandwidth, the datapath to/from the memory system is very wide, and the memory system it- 
self is banked and pipelined. A request to the memory system will “tie up” one bank of 
RAMs during their access time, but will not tie up the entire memory system (so that other 
memory requests to different banks can be started in parallel). The number of memory re- 
quests that can be done in parallel in the initial implementation of the K-1 is four. 


The memory itself is protected by an Error-Correcting Code (ECC). This code will 
correct any single-bit error in a 64-bit word, and will detect any double-bit error in a 64-bit 
word. All errors (both single- and double-bit) are also logged so that the Console can de- 
tect faulty memory and remove it from service. Note that the memory system’s idea of which 
bytes in a cache line comprise a “word” differs significantly from the processor’s. (See the 
section on Data Order for more details). 


This appendix discusses the memory system from the point of view of a diagnostic 
programer. 


At the time this manual went to press, this chapter was still incomplete. 


E.2 Data Order 


Due to the extreme width of the memory datapath in the K-1, the data between the 
processors and the memory system is sliced in an implementation dependent way. Certain 
bytes are grouped together logically in the memory system in a different fashion than the 
grouping in the processor. All discussions of data order will refer to cache lines and cache 
sub-lines, as defined in the section on Cache Effects in Chapter 2. 


All data transfers from/to a CPU to/from the memory system consist of one cache line 
made up of four cache sub-lines. The cache sub-lines are always consecutive pieces of the 
cache line. The sub-line transferred first depends on the address of the reference. For data 
cache misses and writebacks, and for cache coherency responses, the sub-line containing the 
referenced word will always be transferred first. For instruction cache misses, the first sub- 
line (the one with the most low-order zero bits in its address) is always transferred first. 
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Within each sub-line, the bytes which are covered by common ECC bits in the memo- 
ry system are not the obvious groups of consecutive bytes one would expect. If the bytes in 
a sub-line are numbered from 0 to 63, then bytes 0, 1, 8, 9, 16, 17, 23, and 24 are covered by 
the same ECC bits. That is, any single-bit error in these eight bytes will be corrected, and 
any double-bit error in these eight bytes will be detected. This pattern repeats seven more 
times in a sub-line, so that bytes 2, 4, 6, 32, 34, 36, and 38 start eight-byte groups with the 
same relative offsets. These ECC groupings within a sub-line are illustrated in Figure E-1. 
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Figure E-1. ECC Groupings Within a Sub-Line 


In Figure E-1, the rows of data are eight sequential bytes from the CPU’s point of 
view (and the numbering shown is the byte address within a sub-line). The eight, shaded 
regions represent the groups of bytes that are covered by the same ECC bits. The circled lo- 
cations show the positions used for reading and writing the ECC bits within each group using 
the Idecc instruction and the special ECC-writing form of dflush. 
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E.3 Error-Correcting Code 


The memory system of the K-1 uses a SECDED (Single-Error Correcting, Double- 
Error Detecting) code. Mathematically, the code would be called an extended Hamming 
code. The overhead for this code is one byte for each 64 bits. 


The code ... 
The check bits can be calculated as... 


E.4 CPU Features for Diagnosing the Memory System 


uncorrectable memory error traps 
Idecc 

Idnecc 

dflush 


E.5 Console Interactions and Error Logging 


MATR 
error logging 


E.6 Cache Coherency 


Cache coherency 
Bandwidth — normal and coherency 


E.7 Memory System Timing 


The memory system runs at a clock rate that is one third of the processor’s clock. If 
there is no contention, it takes the memory system approximately 16 memory clock cycles to 
respond to a request from a processor. 


From the time when the processor detects a miss in the data cache for a load instruc- 
tion to the time when the result is written into the register file, approximately 52 processor 
clock cycles elapse. The load/store unit is tied up for an additional (approximately) 10 cycles 
while the rest of the data (the other sub-lines) are written into the data cache. 


Note that shared requests, which use the cache coherency scheme, can take longer 
since the memory system waits for all processors to respond before it returns the data. 
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F.1 Introduction 


The I/O system of the K-1 provides very high bandwidths by using a large number of 
independent I/O channels, each of which is capable of very high I/O bandwidths. The K-1 
system supports a number of I/O Controllers (IOCs), which interface to the memory system 
in much the same way that a processor does. In the initial implementation, there can be one 
or two IOCs, each IOC can support up to eight I/O Channel Adapters (IOCAs). Each IOCA 
can support up to eight I/O channels. The I/O channels themselves are compatible with the 
High-Speed Channel (HSC) standard, and operate full-duplex at speeds up to 100 Mega- 
bytes per second. 


The rios and wios instructions provide communication with the IOCs and the IO- 
CAs. Status information can be read from the I/O system with a rios instruction. The wios 
instruction can be used to change control settings, to initiate I/O operations, and for multipro- 
cessor synchronization operations. 

Table F-1 provides a summary of the rios and wios instructions. Section 2 describes 


in detail the operation of the wios and rios instructions. Section 3 explains the format of the 
control and status registers used in the I/O system. 


F-2 Introduction F.1 


Table F-1. rios/wios Instruction Summary 


TYPE 
srca srcod srca rast 
i 6 re se 
co00 [sjeam Savin Regent SO IOCAca on Hic X|Dosvewe 
F 0001 [Write Chanel Contol Regier [OGIOCAC20>CHel.0>|CCRDaadx@| 
1 [ oo10 [Wie Ouput Chana Fld Reps []0C3OCA2.0>.CHct.0>X|Ount Rea | —____—_ 
© [oor [Wit se. Sv. Res, Paks Pinter [OCIOCAC20>CHet.0>.X|SSRP Poin |__| 
C [e100 [ctamsiteset —______fociocacz.0sctia.0>x|Dontewe |__| 
Alo ocares hociocaa.mxoom |powrese | 
enc iwi 


1000 |Write CRP Register File CPU&2..0>, LVL<3..0> Q Address32).) [ee 
eee eee 
aay Ss Pe we PN ae = tee ad fp rokucwiel 
1100 CPU<2..0>, XXXX 
1101 |Set NCRP CPU CPU<z2..0>, XXXX ontveare | 
1111 _|1/O Subsystem Reset [Don'tearo|Don'tewrs 
Read Channel Status Register IOCJOCA<2..0>,REG<1..0>,X Ch. Status 4 x (8) 


Read Input Channel I-Field Register _|IOC,IOCA<2..0>,CH<1..0>,X Input I-Field (32) 


anom ¥ 
| 


— te 


Read IOCA Status Register TOC, IOCA<2..0>,REG<1..0>,X IOCA Status 4 x (8) 
001 |Read CRP Register File (clr LVL) _|CPU<2..0>, LVL<3..0> Q Address(32) 

1010 |Read MPSync flags (clr bits) CPU<z2..0>, LVL<3..0> Bit mask(32) MPSync _flags(32) 
1011 |Read MPSync flags (set bits) CPU<2..0>, LVL<3..0> Bit mask(32) MPSync flags(32) 


Read Sys. Srvc. Req. Packet Pointer |IOC,IOCA<2..0>,CH<1..0> KDon'ecare —_| SSRP Pointer 


Read CRP Register File CPU<2..0>, LVL<3..0> Q Address(32) 


Re ie le |e = _ 
sles SIE BIE BIS 3 


1111 


* «A tl-channels” flag 
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F.2 Instruction Descriptions 


F.2.1 wios 

Instruction: wios srca,srcb,rdst 
Opcode: FD 

Operation: Write I/O Status. 

Operands used: srca, srcb 

Results stored: rdst 

Legal In: Supervisor mode only 

Functional unit: Load/Store 

Exceptions Illegal instruction/privilege violation 
Description: 


Four bits of the wios (Write I/O Status) instruction’s srca operand (srca<35..32>) 
define the operation TYPE field. The srcb operand ADDR field (srcb<6..0>) supplies an ad- 
dress whose contents are dependent on the value contained in the TYPE field. The sr- 
ca<63:36> and srcb<63..7> bit values currently are don’t care and are ignored. The fields 
are shown in Figure F-1. 


The different TYPE operations can be divided into two groups; those directed to the 
IOC and those directed to an IOCA. These two groups are indicated by the MSB of the 
TYPE field in the srca operand (srca<35>). A wios operation with srca<35>=0 is directed 
at a channel on an IOCA, and with srca<35>=1 is directed to an IOC. 


srca UNUSED TYPE DATA 
(28) (4) (32) 
6 
3 
(57) (7) 


Figure F-1. srca and srcb Operand Format 


oo 


>) 
nO 
oo 
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The operations performed by the different TYPE designations are System Service Re- 
quest, Write Channel Control Register, Write Output Channel I-Field Register, Write Sys- 
tem Service Request Packet Pointer, Channel Reset, IOCA Reset, Write CRP Register File, 
Interprocessor Interrupt, Set NCRP CPU, and I/O Subsystem Reset. The description of each 
operation and the contents of the srca and srcb operands follows below. 


IOCA-TYPE wios Instruction Operations 


For all IOCA-type wios instructions, the srcb operand ADDR field specifies the ad- 
dress of the channel that processes the instruction. The channel address can be divided into 
three sub-fields: one IOC select bit (srcb<6>), three IOCA select bits (srcb<5..3>) and two 
channel select bits (srcb<2..1>). The LSB of the ADDR field (srcb<O>) is not currently used 
(and its value is ignored). If there is only one IOC in the system, the IOC select bit is 
checked against a configuration parameter in the IOC, and if there is a discrepancy, an error 
is detected. 


Type Description 


0000 System Service Request: This operation indicates a System Service 
Request Packet is ready to be read from system memory. The srca operand 
DATA field (srca<31..0>) is not used. 


0001 Write Channel Control Register: This operation writes one byte of the 
32- bit srca operand DATA field (srca<31..0>) into the 8-bit Channel 
Control Register. The most significant byte is used for Channel 0, the next 
most significant byte for Channel 1, etc. The two channel select bits 
(srcb<2..1>) select the channel to be written. The definitions of the channel 
control bits and their functions are described later in section 3. 


0010 Write Output Channel I-Field Register: This operation writes the srca 
operand DATA field (srca<31..0>) into the Output Channel I-Field Register. 
The DATA field is a 32-bit value which is presented on the output HSC 
channel during a connection request sequence. The function of the data 
contents is implementation specific. 


0011 Write System Service Request Packet Pointer: This operation 
writes the srca operand DATA field (srca<31..0>) into the System Service 
Request Packet Pointer (SSRP Pointer). The DATA field is a 32-bit system 
memory address which points to a System Service Request Packet. The bits 
in the operand correspond to the memory subsystem physical address lines 
(Addr<35..4>). 


0100 Channel Reset: This operation generates a channel reset for the 
channel specified by (srcb<2..1>). The srca operand DATA field 
(srca<31..0>) is not used. 


0101 IOCA Reset: This operation generates an IOCA reset for the IOCA 
specified by (srcb<5..3>). The srca operand DATA field (srca<31..0>) is not 
used. An IOCA Reset also resets all the channels attached to the IOCA. 

IOC-TYPE wios Instruction Operations 


There are four different types of operations performed by the wios instruction to an 
IOC: Write CRP Register File, Interprocessor Interrupt, Set NCRP CPU, and I/O Subsystem 
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reset. The ADDR ffield is further subdivided into 2 fields, CPU=srcb<6..4> and 
LVL=srcb<3..0>. The most significant CPU bit (srcb<6>) determines the IOC address. If 
this bit is 0, then the operation is destined for IOC(Q); if it’s a 1, then the operation is des- 
tined for IOC(1). IOC(O) is physically connected to CPU(3..0) and IOC(1) is physically con- 
nected to CPU(7..4). The contents of the srca and srcb operands, and a description of each 
type of operation follows: 


Type Description 


1000 Write CRP Register File: This operation writes to the CRP register 
file on the IOC. The CRP register file contains the Queue Address Pointers 
(LVL 8-14), the NCRP pointers (LVL 5), and the MPSync flags (LVL 0-3). 
The location, to be written, is addressed by the CPU (srcb<6..4>) and LVL 
(srcb<3..0>) variables contained in the ADDR field of the srcb operand. All 
Write CRP Register File operations are allowed to occur across IOCs. 
Figure F-2 illustrates the format of the CRP and NCRP pointers. 


For LVL = 8-14, the operation will update the Response Packet (CRP) 
Queue Pointer from the DATA field (srca<31..0>) of the srca operand. 
NOTE: The entire 32-bit value written, will appear in the Ist CRP link 
field. Subsequent CRP link fields will contain the previous packet’s 30- 
bit memory address. (CRP linkage forces the 2 LSBs to be loaded with zero 
due to the size of the I/O Bus Address) 


For LVL = 5, the operation will update the Negative Response Packet 
(NCRP) Queue Pointer from the DATA field (srca<31..0>) of the srca 
operand. NOTE: The entire 32-bit value written, will be used as the 
system address of the Ist NCRP. Even though all 32 bits are stored in 
the register file, the 4 lowest bits will not be used, since NCRP are 256- 
byte relative. Therefore the 4 low order bits should be considered as 
don’t care other than for diagnostics. Subsequent NCRP addresses will 
be generated automatically in the IOC by incrementing the 4 bit address 
field <11..8> modulo 256 bytes, shown above. This field is independent 
of the actual NCRP counter value; so the NCRP queue may exist on any 
64-byte boundary with a maximum packet size of 256 bytes. When 
directed at the NCRP CPU location, this operation type, will cause an IOC 
internal flag, which blocks the writing of response packets to memory, to be 


~<— Data Bits 


~<— Byte Address Bits 


Ce ae TT Se 


NCRP address increment ant 


Figure F-2. CRP and NCRP Pointer Structure 
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reset (thus enabling NCRP writes to memory). This operation also clears the 
IOC internal NCRP counter to zero. 


Non-aligned accesses for NCRPs wraparound a line in memory. 


For LVL = 0-3, the operation will write the DATA (srca<31..0>) field of 
the srca operand into the MPSync locations. NOTE: Do not use this TYPE 
for MPSync accesses, since consistency is not maintained. It is 
provided for Diagnostics only. 


For_all other values of LVL, the operation will complete normally and 
write the location specified, although these locations are not currently used by 
the IOC. 


1100 Inter-Processor Interrupt: This operation type will cause an interrupt 
at level 2 to be generated to the CPU indicated. The operand srcb defines the 
CPU (srcb<6..4>) which will be interrupted. These interrupts are also 
allowed to CPUs which cross IOCs. 


1101 Set NCRP CPU: This operation type sets a pointer in BOTH IOCs to 
indicate; which CPUs’ queue to put an NCRP (Negative Channel Response 
Packet) into, and then interrupt at level 5. The CPU indicated in the ADDR 
field (srcb<6..4>) of the srcb operand is latched as the CPU number to which 
an NCRP is directed. 


This operation also clears both IOC’s NCRP counters to zero. 
Therefore, when changing NCRP CPUs, be sure that the LVLS5 pointer 
location for that CPU is properly initialized to handle 16 more NCRPs. Any 
CPU may set these registers. 


1111 I/O Subsystem Reset: This operation type issues a reset to both IOCs 
and all IOCAs; which will completely clear control state in all IOCs, IOCAs, 
and channels. The I/O subsystem is then ready to resume normal 
functionality. The normal use of this operation is for error recovery. 
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F.2.2 rios 

Instruction: rios srca,srcb,rdst 
Opcode: FE 

Operation: Read I/O Status 

Operands: srca, srcb 

Results stored: rdst 

Legal In: Supervisor mode only 

Functional unit: Load/Store 

Exceptions: Illegal instruction/privilege violation 
Description: 


The rios (Read I/O Status) instruction is used to check status of the I/O subsystem. 
The srca and srcb operands use the same field definitions as the wios instruction (Figure F- 
1). The srca operand data (srca<31..0>) is used only by the IOC for MPSync operations. 
The result returned in rdst is dependent on the TYPE field. 


As for the wios instruction, the MSB of the TYPE field (srca<35>) indicates whether 
the rios operation is directed to the IOCA or to the IOC. 


The operations performed by the different TYPE designations are Read Channel Sta- 
tus Register, Read Input Channel I-Field Register, Read System Service Request Packet 
Pointer, Read IOCA Status Register, Read CRP Register File, Read CRP Register File and 
Clear Level, Clear MPSync bits, and Set MPSync bits. The description of each operation and 
the contents of the srca and srcb operands and the rdst result follows below. 


IOCA-TYPE rios Instruction Operations 


There are four different rios operations that are directed to the IOCAs: Read Channel 
Status Register, Read Input Channel I-Field Register, Read System Service Request Packet 
Pointer, and Read IOCA Status Register. For all IOCA-type rios instructions, the srcb op- 
erand ADDR field specifies the address of the channel that processes the instruction. The 
channel address can be divided into three sub-fields: one IOC select bit (srcb<6>), three 
IOCA select bits (srcb<5..3>) and two channel select bits (srcb<2..1>). The LSB of the AD- 
DR field (srcb<0>) is not used (and its value is ignored). If there is only one IOC in the sys- 
tem, the IOC select bit is checked against a configuration parameter in the IOC, and if there 
is a discrepancy, an error is detected. 
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Type Description 


0001 Read Channel Status Register: This operation reads one byte from 
each of the four 32-bit Channel Status Registers and returns the 32-bit value 
to rdst. The channel select bits (srcb<2..1>) select the proper byte. The 
definition of the channel status bits and their functions are described in 
section 3. The format of rdst is shown in Figure F-3. 


As an example, when a Read Channel Status Register rios is issued 
with srcb<2..1> = ’00’; CSR Byte 0 is requested. The returned data will be as 
follows: 

rdst Byte 0 - HSC 0, CSR Byte 0 
rdst Byte 1- HSC 1, CSR Byte 0 
rdst Byte 2 - HSC 2, CSR Byte 0 
rdst Byte 3 - HSC 3, CSR Byte 0 


Likewise, when srcb<2..1> = ’01’, Byte 1 will be returned from the CSRs 
of all the channels. 


0010 Read Input Channel I-Field Register: This operation reads the Input 
Channel I-Field Register and returns the 32-bit value to rdst. The I-Field 
data is received from the input HSC during a connect sequence. The function 
of the data contents is implementation specific. The format of rdst is shown 
in Figure F-4. 

0011 Read System Service Request Packet Pointer: This operation reads 
the System Service Request Packet Pointer (SSRP Pointer) and returns the 
32-bit value to rdst. The SSRP Pointer is a system memory address for the 
most recently executed System Service Request Packet. The bits in the 


6 
3 
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5 

2 
UNUSED Channel Status 

ia (32) (4x8) 


Figure F-3. Type 0001 rdst Register Format 
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UNUSED I-Field Data 
rdst (32) (32) 


Figure F-4. Type 0010 rdst Register Format 


oo 


K-1 Architecture Manual CONFIDENTIAL Amdahl Key Computer Labs 


F.22 rios F-9 


operand correspond to the memory physical address lines (Addr<35..4>). 
The format of rdst is shown in Figure F-5. 


0100 Read IOCA Status Register: This operation reads one of the four 
IOCA Status Registers and returns the 32-bit value to rdst. The two channel 
select bits (srcb<2..1>) are used to select the proper IOCA register. The 
definition of the IOCA status bits and their functions are described later in 
this section. The format of rdst is shown in Figure F-6. 


IOC TYPE rios Instruction Operations 


There are four different rios operations that are directed to the IOC: Read CRP Regis- 
ter File, Read CRP Register File and clear interrupt, MPSync and clear bit, and MPSync and 
set bit. 


Type Description 


1000 Read CRP Register File: This operation type reads the CRP register 
file on the IOC. The CRP register file contains the Response Packet (CRP) 
Queue Address Pointers (LVL 8-14), the NCRP pointers (LVL 5), and the 
MPSync flags (LVL 0-3). The location to be read, is addressed by the CPU 
(srcb<6..4>) and LVL (srcb<3..0>) variables contained in the ADDR field of 
the srcb operand. Interrupt levels to all CPUs remain unchanged. All read 
CRP Register File operations are allowed to occur across IOCs. 


For LVL = 8-14, the operation will read the Response Packet (CRP) 
Queue Pointer. The rdst will contain the address stored in the register file for 


WO 


33 
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UNUSED SSRP Pointer 
rdst (32) (32) 


Figure F-5. Type 0011 rdst Register Format 
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Figure F-6. Type 0100 rdst Register Format 
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the last response packet written to memory (This reflects the upper 32 bits of 
the byte address<35..4>). The format of rdst is shown in Figure F-7. 


For LVL = 5, the operation will read the Negative Response Packet 
(NCRP) Queue Pointer. The rdst will contain the address of the next NCRP. 
The format of rdst is shown in Figure F-8. 


For all other values of LVL, the operation will read all 32 bits of the data. 
SOFTWARE NOTE: Do not use this TYPE for MPSync accesses. It is 
provided for Diagnostics only. The format of rdst is shown in Figure F-9. 


1001 Read CRP Register File: Same as rios 1000 above, except that for any 
location associated with an interrupt level (2-14), that interrupt level will be 
cleared. 


When directed at the NCRP CPU location at LVL = 5, this operation 
type, will cause an IOC internal flag, which blocks the writing of response 
packets to memory, to be set (thus inhibiting NCRP writes to memory). 


1010 Read (and clear) MPSync: This operation type performs an atomic 
read-modify-write of the MPSync locations. For each IOC, there are 16 
MPSync locations in the CRP Register File and one external to the Register 


WO 


33 0 
21 0 


UNUSED CRP Queue Address 
rdst (32) (32) 


Figure F-7. Type 100x rdst Register Format 
CRP Queue Pointers 
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Figure F-8. Type 100x rdst Register Format 
NCRP Queue Pointers 
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Figure F-9. Type 100x rdst Register Format 
CRP Locations (0-4;6-7;15) 
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File. All registers are 32 bits in length, but the single 32-bit register has a 
shorter latency than the other 16 registers in the register File (shorter by 2 
IOC cycles). 


The CRP register file locations are addressed by the CPU (srcb<6..4>) 
and LVL (srcb<3..0>) variables contained in the ADDR field of the srcb 
operand. When the LVL bits (srcb<3..2>) = 0, the CPU bits (srcb<6..4>) 
concatenated with the LVL bits (srcb<1..0>) form the CRP Register File 
address of the MPSync locations. When LVL bits (srcb<3..2>) = 10, then 
the single 32-bit IOC register is accessed. DATA (srca<31..0>) provides a 
32-bit enable mask for the modification operation. The results in rdst reflect 
the old contents of the MPSync location. 


When this operation type occurs, the addressed MPSync location is read 
and returned to the CPU in rdst. In addition, the contents are modified and 
written back into the same location. Any bit whose corresponding mask 
enable bit is asserted (1)-in the DATA (srca<31..0>) field, will be set to a 
zero. Any bit whose corresponding mask enable bit is not asserted (0) in the 
DATA (srca<31..0>) field, will be unchanged. All read MPSync operations 
are allowed to occur across IOCs, but at a longer latency. 


1011 Read (and set) MPSync: This operation type operates identically to the 


Read (and clr) MPSync except that the enabled mask bits are set to one 
(instead of zero). 


F.3 Control and Status Register Formats 


This section provides descriptions of the control and status registers used in the I/O 
system. 


F.3.1 Channel Control Register Format 


Each HSC has an 8-bit Channel Control Register (CCR) associated with it. A K-1l 
processor is able to write the CCR in order to control the channel. The contents of the regis- 
ter are continuously sampled so modifications to the control bits are not synchronized with 
other IOCA operations. Bit definitions are: 


Bit 7: ICE (Input Channel Enable): This bit enables input channel operations. All 
input channel operations are disabled if this bit is ‘0’. 


Bit 6: ICAC (Input Channel Auto-Connect): This bit enables an auto-connect feature 
for the input channel. If no error conditions are present, an input channel 
request is honored and a connection is established. If error conditions are 
present, ICAC is ignored until there is another CCR WIOS instruction. 


Bit 5: ICC (Input Channel Connect): This bit controls the connection status of the 
input channel. ICC is set to complete a connection sequence and cleared to 
disconnect the channel. 


Bit 4: ICRE (Input Channel Request Enable): This bit controls how unsolicited input 
channel information is interpreted. If ICRE is set, the information is treated like 
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a request packet which contains the packet control bits used by the IOCA. If 
ICRE is clear, the information is treated like input data. The data is not used 
until a system request packet references it. 


Bit 3: OCE (Output Channel Enable): This bit enables output channel operations. 
All output channel operations are disabled if this bit is ‘0’. 


Bit 2: OCR (Output Channel Request): This bit controls the request signal of the 
output channel. If OCR is set and the connection is not established, a request 
sequence for the output channel is started. If OCR is set and the connection is 
already established, no action is taken. If OCR is clear and the connection is 
established, the connection is aborted. If OCR is clear and the connection is 
not established, no action is taken. 


Bits 1..0: Not implemented. 
All bits of all CCR’s are reset by a Channel Reset, an IOCA reset, or an I/O Sub- 


system Reset. 


F.3.2 Channel Status Register Format 


The Channel Status Register (CSR) is a 32-bit wide register associated with each 
channel. A K-1 processor is able to read the CSR in order to monitor the channel. These 
registers are read in a byte fashion so that a portion of EACH CSR is read. Channel select 
bits (srcb<2..1>) select the proper byte for reading. Bit definitions are: 


BYTE 0: 
Bit 31: ANY Error - set for ANY error on this channel. 


Bit 30: IC INTERCONNECT - value of the HSC INTERCONNECT signal for the 
input channel. 


Bit 29: IC REQUEST - value of the HSC REQUEST signal for the input channel. 
Bit 28: IC CONNECT - value of the HSC CONNECT signal for the input channel. 


Bit 27: OC INTERCONNECT - value of the HSC INTERCONNECT for the output 
channel. 


Bit 26: OC REQUEST - value of the HSC REQUEST signal for the output channel. 
Bit 25: OC CONNECT - value of the HSC CONNECT signal for the output channel. 


Bit 24: Direct/Indirect Connect Device - type of input channel device - scanned entry. 
‘1’ means Direct Connect Device. 


BYTE 1: 
Bit 23: IOC Channel Error - [OC-detected channel error. 


Bit 22: IC BCF Error - BCF-detected error for the input channel. 
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Bit 21: OC BCF Error - BCF-detected error for the output channel. 

Bit 20: IC CIO Error - ClO-detected error for the input channel. 

Bit 19: OC CIO Error - ClO-detected error for the output channel. 

Bit 18: BSI ICIF Read Error - BSI parity error on ICIF read from the IC CIO. 
Bit 17: BSI CCR Write Error - CCR write to disabled channel. 


Bit 16: SSR Counter Error - SSR wios received with counter full. 


BYTE 2: 
Bit 15: BCC Error - BCC-detected channel error. 


Bit 14: BDI Error - BDI-detected channel error. 


Bits 13..8: Not implemented. 


BYTE 3: 
Bits 7..0: Not implemented. 


A ‘0’ will be returned for bits that are not implemented. 


F.3.3 IOCA Status Register Format 


The IOCA Status Register may be viewed as four logical 32-bit wide registers asso- 
ciated with the IOCA and the HSC channels. A K-1 processor is able to read these regis- 
ters in order to monitor the IOCA and channels. Register selection is via (srcb<2..1>). Reg- 
ister formats and bit definitions are: 


IOCA STATUS REGISTER 0: 
BYTE 0: returns the CCR associated with HSC 0. 


BYTE I: returns the CCR associated with HSC 1. 
BYTE 2: returns the CCR associated with HSC 2. 
BYTE 3: returns the CCR associated with HSC 3. 


IOCA STATUS REGISTER 1: 
BYTE 0: returns the SSR Counter associated with HSC 0. 
BYTE I: returns the SSR Counter associated with HSC 1. 
BYTE 2: returns the SSR Counter associated with HSC 2. 
BYTE 3: returns the SSR Counter associated with HSC 3. 
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IOCA STATUS REGISTER 2: 
Bit 31: ANY IOCA error other than channel errors which are reported via CSR. 
Bit 30: BSI-detected SDATA parity error. 
Bit 29: BSI-detected RDATA parity error. 
Bit 28: BCC-detected IOS error. 
Bit 27: IOCA-detected grant tag error. 
Bits 26..0: Not implemented. 


IOCA STATUS REGISTER 3: 
Bits 31..0: Not implemented. 


A ‘0’ will be returned for bits that are not implemented. 
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G.1 Introduction 


The Console interfaces to K-1 processors in two ways: through scan strings and 
through the TTY interface. 


At the time this manual went to press, this chapter was still incomplete. 


G.2 Interrupts 


The Console is capable of interrupting a K-1 processor in several ways. First, the 
Console can send a NMI (Non-Maskable Interrupt). Second, the TTY interface will inter- 
rupt the processor when the processor’s TTY read port is full (RPI), or when the processor’s 
TTY write port is empty (WPI). The section on Traps, Interrupts, and Machine Checks 
in Chapter 2 discusses interrupts in more detail. 


The NMI interrupt is intended for panic situations, such as an emergency shutdown 
due to an imminent power or cooling system failure. The TTY interface is discussed in more 
detail in the next section. 


G.3 TTY Interface 


The TTY interface provides a way for the Console and a K-1 processor to communi- 
cate. Each K-1 processor has a separate TTY port to the Console. While each processor 
only has one write port (the path from the processor to the Console), there are three different 
read ports available to each processor: the high-priority read port, the low-priority read port, 
and the debugger read port. 


Read ports... 
Write port... 
The rfec and wfec instructions read and write a processor’s TTY port. 
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G.4 Performance Counters —~ 


G.5 Clock Control 


halt in supervisor mode 
data watchpoint 
trap in Trap State 


G.6 Scan Control and Hidden State 


reset state of machine 
contents of registers: 
Processor Number within MP 
Number of processors in MP 
Date/time 
MATR (Memory Address Translation Register) 
Software revision level of FE software 


G.7 Bootstrap Procedure 


bootstrap procedure 


synchronous initialization of uptime counters in all processors 


K-1 Architecture Manual CONFIDENTIAL Amdahl Key Computer Labs 


Appendix H. Implementation Dependencies 


H.1 The Version 1 Implementation 


This appendix describes those aspects of the K-1 architecture that are implemented 


differently in the Version 1 implementation than described in the main body of the K-1 Archi- 
tecture Manual. Copies of the appropriate figures with the field sizes and restrictions for this 
implementation are included for clarity. 


[2-1] 


[2-2] 
[2-3] 
[2-4] 


[2-5] 


Only the first 32 registers are supported (registers r0 through r31). References 
to registers r32 through r63 are illegal and will produce unpredictable results. 


Only ELF flags corresponding to registers r0 through r31 exist. 
Addresses are 36 bits. 


Virtual and physical page numbers are 20 bits. This limits the maximum number 
of pages available to a process to 1024K. Figure 2-6 appears as follows: 


Virtual Page Number Page Offset 


(20) (16) 


Figure 2-6. Virtual Address Format 


In supervisor mode (and in Trap State), the first 32M bytes of virtual instruction 
space are mapped to the first 32M bytes of physical memory. References outside 
of this range will cause an instruction map miss trap. 
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[2-6] Figure 2-7 appears as follows: 


Virtual Page -Number Physical Page 


(20) (20) 


Figure 2-7. Instruction Page Table Format 


[2-7] Both the Virtual Page Number and the Physical Page Number fields are 20 bits 
long and the remaining high-order bits must be set to zero. 


[2-8] Table 2-3 appears as follows: 


Table 2-3. Instruction Page Table Size Specifier 


ae | # of Pages Mapped 


0 1 

1 4 

2 16 

3 64 
4-7 Illegal 


If the SS field has the value i, then this page table entry references 
pages. The low 2i bits of both the virtual and physical page numbers must be ze- 


271 contiguous 


ro. In other words, a 271_page table entry must be 27\_page aligned. Each in- 
struction page table entry can map up to 64, 64K-byte pages or 4M bytes. (That 
is, each entry can map as much as bits 21..3 of the PC.) With eight entries, a to- 


tal of 32M bytes can be mapped. 


[2-9] The data page table is a 16K-entry, one-way set-associative (direct-mapped) 
cache. 


[2-10] The data page table hashing function is the exclusive-OR of the low-order 14 bits 
of the virtual page number and the zero-extended Process Key field. 
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[2-11] 


The Version 1 Implementation H-3 


Figures 2-8 and 2-9 appear as follows: 


44 a2 
87 65 


11 
65 0 


Virtual Page Number Unused 


(20) (16) 


Figure 2-8. Virtual Page Number 


0000 
3210 


op; 
nt 
NAW 
Mw 
nr 
an 


Unused Physical Page Number Unused d 
(16) (20) (16) : 


[2-12] 


[2-13] 


[2-14] 


[2-15] 


Figure 2-9. Data Page Table Format 


Both the Virtual Page Number and the Physical Page Number are 20 bits long 
and the remaining high-order bits must be set to zero. 


The cache line size for the instruction and data caches is 256 bytes. The memory 
transfer sub-line size is 64 bytes. 


The instruction cache holds 4K lines (1M bytes) and is addressed by bits 19..0 of 
a physical address. The data cache holds 8K lines (2M bytes) and is addressed 
by bits 20..0 of a virtual address. 


The instruction stache line size is 64 bytes. The instruction stache stores 16 lines 
(1K bytes). 


The Uptime Counter is 54 bits. The interval field of the Interval Timer register is 
27 bits. Only these 27 bits are compared with the low-order 27 bits of the Uptime 
Counter. Figures 2-10 and 2-11 appear as follows: 
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Unused 


(10) 


Figure 2-10. Uptime Counter 


Unused Interval 


(36) (27) 


[2-16] 


[3-1] 


Figure 2-11. Interval Timer Register 


Attempts to write into unused bits of the Processor Status register may cause a 
trap. This issue was still open at the time this document was released. 


Only the low 34 bits of the Absolute Branch Address are implemented. The sev- 
en unused high-order bits must be zero. Figure 3-6 appears as follows: 


Absolute Branch Address 


(34) 


[5-1] 
[5-2] 


Figure 3-6. Absolute Branch Instruction Format 


Refer to [2-3]. 
Refer to [2-10]. 
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[5-3] 


The Version 1 Implementation H-5 


Only the low 33 bits of the Data Watchpoint Word Address field are implement- 
ed. The 12 unused high-order bits must be zero. Figure 5-1 appears as follows: 
44 33 0000 
87 65 


Data Watchpoint Word Address 


(33) 


[5-4] 
[5-5] 


[5-6] 
[8-1] 
[8-2] 
[9-1] 
[9-2] 
[9-3] 
[9-4] 


[9-5] 
[9-6] 
[9-7] 
[9-8] 
[9-9] 


[9-10] 
[10-1] 


Figure 5-1. Data Watchpoint Table Entry Format 


The size of the Data Watchpoint Table is 2 (probably). 


The low-order 8 bits of the address must be zero or an illegal address trap will oc- 
cur. 


Bits 10..8 of the address are replaced with the unique CPU ID. 
Refer to [2-15]. 

Refer to [2-17]. 

Refer to [2-6] through [2-8]. 

Refer to [2-11]. 

Refer to [2-10]. 


The instruction cache is addressed by bits 19..8 of the physical address corre- 
sponding to srca, and the addressed cache line is killed. Since an entire 256 byte 
cache line is killed, bits 7..0 are not needed to address the instruction cache, 
though they are relevant if execution is continued sequentially from after the tar- 
get of the ickill. In supervisor mode, since the physical address is the same as 
the virtual address, ickill may be used to kill physical memory, but only up to the 
32 MB limit of supervisor mode addressing. 


Bits 20..8 of srca give the cache line index. 

The low-order 8 bits of srca must be zero, or else an illegal address trap occurs. 
Refer to [2-12]. 

pel is not implemented and acts like a nop. 


iskill takes effect on the third instruction word fetched after the instruction con- 
taining the iskill. 


There are two delayed write buffers. 


Refer to [5-6]. 
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Appendix I. Floating-Point Operation Details 


This appendix contains details of how floating-point operations are carried out in the 
K-1. Tables are presented showing the input/output relationships for all floating-point oper- 
ations. In conjunction with the IEEE floating-point standard document, ANSI/IEEE 754- 
1985, this appendix completely specifies the behavior of K-1 floating-point. 


I.1 Abbreviations 


The following abbreviations are used throughout this appendix: 


NaN Not-a-Number (either quiet or signaling) 

QNaN Quiet NaN 

SNaN Signaling NaN 

Z Zero 

FNZ Finite nonzero representable floating-point value 
oo Infinity 

BIAS Exponent bias 


INV OP Invalid operation exception 

DIV BYZ__ Division by zero exception 

FOVF Floating-point overflow exception 
FUDF Floating-point underflow exception 
INEXACT _Inexact exception 


I.2 Use of NaN 


As described in the main body of the K-1 Architecture Manual, there are two kinds of 
NaNs recognized by the K-1: quiet and signaling. Section 2.3.2 specifies which bit patterns 
are recognized as QNaNs and as SNaNs, and what bit pattern is generated for QNaNs. As 
described there, except for data moving instructions, when the K-1 generates a NaN, it is al- 
ways a QNaN with a sign bit of zero. Note that data moving instructions (such as negate) 
will never “generate” a NaN (though they move an arbitrary NaN from their input to their 
output). 


The floating-point negate function (neg.d and neg.s) is considered to be a data mov- 
ing operation and therefore behaves differently than other floating-point instructions when 
presented with a NaN. Negate changes the sign of any input operand, including a NaN, and 
does not cause an invalid operation exception, even when given a signaling NaN. 


Note that except for the negate function, no bits of the fraction part of an input NaN 
are preserved by any of the floating-point operations. 
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I-2 Floating-Point Overflow and Underflow 1.3 


~ 


1.3 Floating-Point Overflow and Underflow 


As mentioned in the section on Processor Status in Chapter 2, the K-1 supports 
floating-point exception traps, but they are not IEEE compatible. (The IEEE standard does 
not require any support for traps.) In particular, the behavior on floating-point overflow and 
underflow traps differs from that recommended by the IEEE standard. The results stored on 
floating-point overflow and underflow when trapping is not enabled are precisely in agree- 
ment with the standard. 


The result stored for a floating-point operation that overflows when floating-point 
overflow traps are not enabled depends upon the sign of the infinitely precise result and the 
rounding mode in effect as described in section 7.3 of the standard. If floating-point overflow 
traps are enabled, then the result stored for an overflowed operation will contain the sign bit 
and the correctly rounded fraction, but a truncated exponent field. The “correct” unbiased ex- 
ponent may be calculated by subtracting BIAS and then, if the result is negative, adding 
2*(BIAS + 1). 


Floating-point underflow is defined in terms of tininess and loss of accuracy. The K-1 
defines a result to be tiny whenever the infinitely precise unrounded result, r, satisfies the 


condition -2Emin < y < 2Emin. Loss of accuracy is defined to occur when an inexact result is 
delivered. 


When floating-point underflow traps are not enabled, a floating-point underflow ex- 
ception is signaled whenever both tininess and loss of accuracy are detected. The correctly 
rounded result (which might be zero, +2Emin, or a denormalized number) will be stored and 
the floating-point underflow and inexact exception flags will be set. When floating-point un- 
derflow traps are enabled, floating-point underflow exceptions require only that tininess be 
detected. In this case, the correctly rounded denormalized result (which may be zero) is 
stored, instead of the rebiased result suggested by the IEEE standard. Either a floating- 
point underflow or a floating-point underflow and inexact trap occurs, depending on whether 
the result was inexact. 


Note that floating-point underflow and overflow exceptions frequently occur along 
with inexact exceptions. If a trap occurs in such a case, the Trap Locators (see Chapter 2 
and Appendix D) only store an indication that the combined condition (e.g., floating-point un- 
derflow and inexact) occurred, even though only one of the two exceptions may have been en- 
abled to trap. 


1.4 Floating-Point Operation Tables 


This section contains a table for each floating-point operation showing the results and 
exceptions generated for all operands. For each table entry, the first line shows the result, 
and subsequent (upper case) lines indicate the exceptions. Results and exceptions that de- 
pend on the values of the FNZ inputs are given in parentheses. Footnotes explain the rea- 
sons for unusual results and exceptions. 
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1.4.1 Floating-Point Addition 1-3 


1.4.1 Floating-Point Addition 
The results for the floating-point addition instructions (add.d and add.s) are shown 


in Table I-1. Since addition is symmetric, only the upper triangular portion of the table is re- 
quired. 
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I-4 Floating-Point Addition 1.4.1 


Table I-1. Addition Results 


See ewww ween 
’ 


+ FNZ (or +00) 
(FOVF) 
(INEXACT) 


note 2 


+ FNZ 


- FNZ (or -0) 
(FOVF) 
(INEXACT) 
note 2 


Ey cS 


See ww wee ste wee eee eee eee eee see ates esse ense INV aye | oe 
SNaN QNaN | QNaN 
INV OP| INV OP 
QNaN 


Se ee ee ek ica 


Note 1: Section 6.3 of the IEEE standard specifies that if the sum of two 
operands with opposite signs is exactly zero, the sign of the result 
depends on the rounding mode. If the floor rounding mode is used, 
the sign is negative; for all other rounding modes the sign is positive. 


Note 2: Standard Section 7.5 specifies that if the sum overflows without an 
overflow trap, the INEXACT exception will also be signaled, and the 
result may be the same-signed infinity or the largest-magnitude 
number, depending on the rounding mode as specified in the IEEE 
standard section 7.3. 


Note 3: Underflow can never be inexact in addition/subtraction. Thus, 
underflow is never signaled when its trap is not enabled: the 
condition for signaling in that case requires both underflow and 
inexact. A trapped underflow will never be accompanied by 
INEXACT. 
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is Floating-Point Negation I-5 


1.5 Floating-Point Negation 


The floating-point negate instructions (neg.d and neg.s) simply complement the sign 
bit of their input operand, regardless of the value it may represent. In particular, negate does 
not cause an invalid operation exception when given an SNaN, nor does it output a QNaN in 
this case. 


1.6 Floating-Point Subtraction 


The floating-point subtraction operation calculates srcb - srca. This operation is de- 
fined to be srcb plus the negation of srca. The results of floating-point subtraction instruc- 
tions (sub.d and sub.s) can therefore be determined from the above rule for negation togeth- 
er with Table I-1. 
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I-6 Floating-Point Multiplication LT 


1.7 Floating-Point Multiplication 


The sign of the result for all input operands is always the exclusive-or of the sign 
bits of the input operands. The results for the floating-point multiplication instructions 
(mult.d and mult.s) are shown in Table I-2. 


Table I-2. Multiplication Results 


Note 1: Section 7.5 of the IEEE standard specifies that if the result overflows 
without an overflow trap, the INEXACT exception will be signaled, 
and the result may be the same-signed infinity or the largest- 
magnitude number, depending on the rounding mode as specified in 
the IEEE standard section 7.3. 


Note 2: An underflowed number may become zero through denormalization 


loss. Inexactness is one of the conditions detected for underflow 
(when traps are not enabled), so an FUDF exception could be 
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1.8 Floating-Point Division I-7 


1.8 Floating-Point Division 
The sign of the result for all input operands is always the exclusive-or of the sign 


bits of the input operands. The results for the operation srca + srcb (the floating-point divide 
instructions div.d and ;div.s) are shown in Table I-3. 


Table I-3. Division Results 


srcb Z 


FNZ oo SNaN QNaN 
QNaN QNaN QNaN 
INV OP INV OP 


srca 


Z 
FNZ (or 2 or Z) 
°o (FUDF) QNaN 
FNZ DIV BY Z (FOVF) Z INV OP QNaN 
(INEXACT) 
note 1,2 
oo oo QNaN QNaN QNaN 
note 3 INV OP INV OP 
SNaN QNaN QNaN QNaN QNaN QNaN 
INV OP INV OP INV OP INV OP INV OP 
QNaN 


QNaN QNaN QNaN QNaN QNaN 
INV OP 


Note 1: Section 7.5 of the IEEE standard specifies that if the result overflows 
without an overflow trap, the INEXACT exception will be signaled, 
and the result may be the same-signed infinity or the largest- 
magnitude number, depending on the rounding mode as specified in 
the IEEE standard section 7.3. 


Note 2: An underflowed number may become zero through denormalization 
loss. Inexactness is one of the conditions detected for underflow 
(depending on whether traps are enabled), so an FUDF exception 
could be accompanied by an INEXACT exception (IEEE standard 
section 7.4). 


Note 3: The IEEE standard section 7.2 specifies that the DIV BY Z 
exception should not be signaled in the case of a divisor equal to zero 
and a dividend equal to infinity. The correctly signed infinity will be 
returned, and no exception will be signaled. 
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1-8 Floating-Point Square Root 9 


1.9 Floating-Point Square Root 


The results for the operation srca’” (the floating-point square root instructions sqrt.d 
and sqrt.s) are shown in Table I-4. Note that square root can never underflow. 


Table I-4. Square Root Results 


Operand Result Value and Exceptions 


FNZ in the range rounded result 
Z<FNZ<co (INEXACT) 
FNZ in the range QNaN 
-0o < FNZ <Z INV OP 


SNaN 
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1.10 Floating-Point Conversions 1-9 


1.10 Floating-Point Conversions 


Hardware support is provided for conversions between floating-point formats and for 
conversions between floating-point and integer formats. 


1.10.1 Conversion Between Floating-Point Formats 


The results of conversion from double precision to single precision (the cvtd.s instruc- 
tion) are shown in Table I-5. The results of conversion from single precision to double preci- 
sion (the evts.d instruction) are shown in Table I-6. 


Table I-5. Double to Single Conversion Results 


Operand Result Value and Exceptions 


FNZ smaller than Z, or denorm, or smallest normalized number 
smallest representable FUDF, (INEXACT) 
nonzero single precision note 1 


number (FNZS) 


FNZ greater than co or largest finite number 
largest representable FOVF, (INEXACT) 
finite single precision 


number (FNZL) 


Note 1: If an underflow trap is enabled, then the inexact exception is not a 
required condition for underflow. An INEXACT exception could also 
occur and be signaled (IEEE standard section 7.4). 


Note 2: Section 7.5 of the IEEE standard specifies that if the result overflows 
without an overflow trap, the INEXACT exception will be signaled, 
and the result may be the same-signed infinity or the largest- 
magnitude number, depending on the rounding mode as specified in 
IEEE standard section 7.3. 
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I-10 Conversion Between Floating-Point Formats 1.10.1 


Table I-6. Single to Double Conversion Results 


Operand Result Value and Exceptions 


Table I-7. Signed or Unsigned Integer to Floating-Point Conversion Results 


Operand Result Value and Exceptions 


any nonzero 
integer 


Note 1: Since the integer representation has 64 bits of precision and the 
floating-point format has 24 or 53, the floating-point format 
representation can be inexact, in which case an INEXACT exception 
is signaled. 
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1.10.2 Conversion Between Floating-Point and Integer Formats I-11 


1.10.2 Conversion Between Floating-Point and Integer Formats 


The results of conversion from signed integer to double or single precision floating- 
point format (the cvtl.d and cvtl.s instructions) are shown in Table I-7. The results of con- 
version from unsigned integer to double precision floating-point (the cvtul.d instruction) can 
also be seen in Table I-7. The results of conversion from double or single precision floating- 
point to signed integer format (the cvtd.I and cvts.I instructions) are shown in Table I-8. 


Table I-8. Floating-Point to Signed Integer Conversion Results 


Operand Result Value and Exceptions 


FNZ of greater magnitude largest-magnitude same-signed integer 
than can be represented in integer INV OP 
format (either positive or negative: note 1 
note positive and negative maximum 


possibly rounded integer 
(INEXACT) 
note 2 


largest magnitude same-signed integer 
INV OP 
note 1 


Note 1: Conversion to integer of infinity and numbers too large to be 
represented in the destination format will be considered invalid 
operations. According to the IEEE standard section 7.1, if conversion 
of a floating-point number to an integer format results in an infinity, a 
NaN, or overflows and cannot be represented in the integer format, 
and this cannot be signaled any other way, then the INV OP 
exception will be signaled. 


Note 2: If the result cannot be converted back into floating-point format and 


give the same original floating-point value, then INEXACT is 
signaled (IEEE standard section 7.5). 
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absolute branch instruction format 3-4, 3-5, 7-1, 7- 
2, 7-4 
access 
supervisor mode 2-9, 2-10, 2-11, 2-12, 2-15, 2- 
21, 2-22, 2-23, 2-26, 5-1, 5-4, 5-47, 5- 
48, 5-53, 9-8 
user mode 2-9, 2-10, 2-11, 2-12, 2-13, 2-15, 2- 
19, 2-22, 2-23, 5-1, 5-2, 5-3, 5-4, 5-53 
add instruction C-4 
add.d instruction C-2, I-3 
add.s instruction I-3 
addc instruction C-4 
address 
absolute branch 3-5, 7-2, 7-4, H-4 
alignment 2-8 
mapping 2-10 
physical 2-10 
program 2-16, H-2 
reference 2-9 
virtual 2-9, 2-10 
address mapping 2-10 
data 2-11, 5-1 
instruction 2-10 
addressing modes 2-8 
addt instruction D-7 
alignment 2-8 
Arithmetic Exception Flags 2-20, 2-21, 4-1, C-9 
Arithmetic Trap Enables 2-20, 4-1 


bank conflict C-3, C-11 
base 2-9 
binary semaphore 2-16 
boof instruction C-4 
bootstrap 2-29 
bpt instruction 2-25, 2-28, 2-29, C-9, D-3 
branch 
target 2-18, 9-4 
branch address 
absolute 3-5, 7-2, 7-4, H-4 
branching 
conditional 2-7, 4-1 
delayed 2-17 
Byte Order Low-to-High 2-3, 2-8, 2-20, 2-21, 2- 
22, 5-1, 5-2, D-15 
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cache 
data 2-1, 2-14, C-3, C-7, C-11 
instruction 2-1, 2-14, C-7, C-10 
cache coherence 2-1, 2-15, 2-16, 9-8, E-3 
cache effects 2-14 
cache line 2-14, 5-3, 5-4, 5-53, 9-7, 9-8, 9-9, 9-10, 
12-3, E-1, H-3, H-5 
cache miss 9-8, 9-9, C-3, C-7, C-10, C-11, C-12 
call instruction 3-4, 3-5, 3-6, 7-1,C-4, D-6 
check trap 6-61 
chk instruction D-7 
clock 
cycle 2-23, C-1 
compare instructions C-4 
floating-point 4-1 
integer 6-16 
conditional branching 2-7, 4-1 
conditional execution 2-18, 4-1 
conditional short constant instruction format 3-2, 
3-3 
console 
interrupt 2-24, 2-26 
constants 2-1, 2-6, 2-8, 2-9, 5-2 
long 3-3, 12-3 


short 3-3 
counter 
uptime 2-23 


cvtd.l instruction I-11 
cvtd.s instruction I-9 
cvtl.d instruction I-11 
cvtl.s instruction I-11 
cvts.d instruction I-9 
cvts.l instruction I-11 
cvtul.d instruction I-11 
cycle 2-23, C-1 


data 
shared 2-14, 2-15, 5-1, 5-44, 5-47, 5-48, 9-8 
data address mapping 2-11 
data cache miss 9-8, 9-9, C-3, C-7, C-11, C-12 
data map miss 2-13, 2-24, 5-2, 5-3, 9-9 
data page table, see page table 
datatypes 2-3 
data watchpoint 5-1, 5-3, 5-4, 5-53 
data watchpoint table 5-53 
decode trap 2-25, 2-26, 2-27, 2-28, D-1, D-3, D-4 
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delay instructions 2-17, 2-18, 7-1 
first 2-17, 2-18 
second 2-17, 2-18 
delay slot 2-17, 2-18, 7-1 
first 2-17, 2-18 
second 2-17, 2-18 
delayed branching 2-17 
delayed execution control field 2-17, 2-28, 3-4, 10- 
8 
delayed write buffer 9-8, C-11, H-5 
dflush instruction 2-12, 2-13, 2-16, 2-29, 5-2, 5-3, 
5-53, C-12, D-13 
shared 2-16, 9-8 
disable bit 2-17, 2-18, 2-25, 2-28, 2-29 
div.d instruction C-6, I-7 
div.s instruction C-6, I-7 
divssr instruction 2-21, C-6 
divsst instruction 2-21, C-6 
dshfl instruction C-4, C-6 
dshfr instruction C-4, C-6 


early load 2-18 
Early Load Alignment Trap 2-19, 2-20, 2-23, 5-3, 
5-4, D-15 
ECC 5-2, 5-3, 5-4 
echk instruction 2-7, 2-19, D-3, D-12 
echk trap 2-19, 5-49, 5-50 
ELF flags 2-7, 2-19, 2-23, 2-29 
eload instruction 2-7, 2-8, 2-12, 2-13, 2-19, 2-23, 
5-1, 5-2, 5-4, D-12 
special error conditions 5-4 
exception 
floating-point 2-20, 4-1, 4-5, 4-13 
execution 
conditional 2-18 
program 2-16 
exts instruction 2-17, 2-22, 2-28, 3-4, 3-6, 5-1, 5- 
2, 5-3, 5-53, C-10, D-6, D-15 
exts instruction format 3-4, 3-6 


fdst 3-1 
flags 
Arithmetic Exception Flags 2-20, 2-21, 4-1 
ELF flags 2-7, 2-19, 2-23, 2-29 
Processor Status 2-7, 2-19, 2-20, 3-2, 4-1 
floating-point 
addition I-3 
conversion I-9 
division I-7 
multiplication I-6 
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negation I-5 
square root 1-8 
subtraction I-5 
trap 2-26, 4-5, 4-13, D-1, D-3, D-4, D-7 
floating-point add functional unit 2-1, 2-2, 6-1, C- 
2, C-3, C-5, C-9, C-10, D-1, D-5, D-8 
floating-point divide/square root functional unit 2- 
1, 2-2, 6-1, C-4, C-5, C-9, C-10, D-1, D-2, 
D-5 
floating-point multiply functional unit 2-1, 2-2, 6- 
1, C-3, C-5, C-9, C-10, D-1, D-8 
floating-point multiply unit D-5 
floating-point numbers 2-4 
floating-point operations 1-2 
format code 2-6, 3-1, D-8 
format, see instruction format 
fsrc 3-1 
functional unit 2-1, 2-26, C-1, C-2, C-3, C-5, C- 
10, D-2 
fetch, see instruction fetch unit 
floating-point 4-13 
floating-point add 2-1, 2-2, 6-1, C-2, C-3, C- 
5, C-9, C-10, D-1, D-5, D-8 
floating-point divide/square root 2-1, 2-2, 6-1, 
C-4, C-5, C-9, C-10, D-1, D-2, D-5 
floating-point multiply 2-1, 2-2, 6-1, C-3, C- 
5, C-9, C-10, D-1, D-5, D-8 
integer 2-1, 2-2, 6-1, C-3, C-4, C-5, C-9, C- 
10, D-1, D-5, D-8 
issue, see issue unit 
latency 2-25, C-2, C-3, C-4, C-8 
load/store 2-1, 2-2, C-3, C-5, C-7, C-10, C- 
11, D-1, D-5 
pipelines C-2, C-3, C-7, C-9, C-10, D-11 
read port C-5, C-6 
wrapping C4 


general registers 2-7 


halt 2-26 
halt instruction C-5 


I/O, see input/output 

10 instruction 2-16, 2-25, 3-1, 3-2, 3-4, 3-6, C-1, 
C-5, C-6, C-9, D-6, D-7, D-8, D-9 

Il instruction 2-16, 2-25, 3-1, 3-2, 3-4, 3-6, C-1, 
C-5, C-6, C-9, D-6, D-7, D-8, D-9 

ickill instruction 2-29, 3-4, 3-6, C-10, D-6 
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illegal access 2-13, 5-2, 5-3, 5-4, 9-9 
illegal address 2-8, 2-19, 2-23, 5-1, 5-2, 5-3, 54 
illegal instruction 2-18, 2-25, 3-6, 7-1, 13-1 
index 2-9 
input/output 2-1, 2-23 
interrupts 2-23, 2-24, 2-26 
instruction 3-1 
disabled 2-18, 7-1 
I0 2-16, 2-25, 3-1, 3-2, 3-4, 3-6, C-1, C-5, C- 
6, C-9, D-6, D-7, D-8, D-9 
Il 2-16, 2-25, 3-1, 3-2, 3-4, 3-6, C-1, C-5, C- 
6, C-9, D-6, D-7, D-8, D-9 
issue 2-1, C-1, C-3, C-6 
instruction address mapping 2-10 
instruction cache C-7, C-10 
instruction cache miss C-10 
instruction fetch 2-16, 2-17, 2-18, 2-25 
trap 2-26, 2-27, 2-28, D-1 
unit 2-1, 2-2, C-1, C-5, C-6 
instruction formats 2-6, 2-7, 3-1 
absolute branch 3-4, 3-5, 7-1, 7-2, 7-4 
conditional short constant 3-2, 3-3 
exts 3-4, 3-6 
long constant 2-25, 3-3, 3-4, 3-5, 3-6, 12-3, C- 
8, C-9 
PC-relative branch 3-1, 3-4, 3-5, 3-6, 7-1 
register 3-2, 3-3, C-8, C-9 
register branch 3-4, 3-6, 7-1, 7-3, 7-5, 9-4 
short constant 3-2, 3-3, C-8, C-9 
unconditional short constant 3-2, 3-3 
instruction issue C-1, C-2, C-3, C-4, C-5, C-6, C- 
7, C-8, C-9, C-10, C-11 
instruction map miss 2-17, 2-25, 2-28, 9-4, H-1 
instruction opcode 3-1 
instruction page ID 2-10, 2-15, D-5 
instruction page table, see page table 
Instruction Queue C-8 
instruction stache 2-15, C-1, C-7, C-8, C-10 
instruction stache miss C-7, C-8 
instructions 
bit count 6-54 
boolean 6-39 
cache 9-1 
check 6-61 
data moving 6-29 
ELF flag 5-49 
flag 6-58 
floating-point 4-1 
floating-point compare 4-1 
floating-point computation 4-13 
floating-point conversion 4-5 


Index 


integer compare 6-16 
load/store 5-1 
miscellaneous 12-1 
processor status register 8-1 
reverse 6-54 
shift 6-48 
timer 8-1 
transfer of control 7-1 
trap 10-1 
virtual memory 9-1 
Integer Divide Trap Enable 2-21, 6-14, 6-15 
integer functional unit 2-1, 2-2, 6-1, C-3, C-4, C- 
5, C-9, C-10, D-1, D-5, D-8 
integer trap 2-26, D-1, D-7 
integers 2-3 
interlock C-2,C-3 
flag 12-3,C-4 
I0/fll_ C-5 
register C-3 
interprocessor synchronization 2-16 
interrupt 2-25, 2-26, D-1 
console 2-24, 2-26 
VO 2-23, 2-24, 2-26 
interval timer 2-24, 2-26, D-1 
interval timer interrupt 2-24, 2-26, D-1 
iskill instruction 2-15, 9-4, C-10 
issue 2-16, 2-21 
instruction 2-1, 2-21, 2-22, 2-25, 2-26, 2-27, 
C-1, C-2, C-3, C-4, C-5, C-6, C-7, C- 
8, C-9, C-10, C-11 
unit 2-1, 2-2, C-1, C-7, C-8, C-9, C-10 


jump instruction 3-4, 3-5, 3-6, D-6 


latency 2-25, C-2, C-3, C-4, C-8 

Idecc instruction 5-4, D-12 

Idnecc instruction 2-12, 5-4, D-12 

Idpage instruction 2-13, 2-29, C-11, D-13, D-14 

line, see cache line 

lipage instruction 2-10, 2-29, C-5, C-7 

load 

early 2-18 

load instruction 2-7, 2-12, 2-13, 2-18, 2-19, 2-23, 
5-1, 5-2, 5-4, C-7, C-11, D-12 

load.| instruction C-10 

load/store functional unit 2-1, 2-2, C-3, C-5, C-7, 
C-10, C-11, D-1, D-5 


YO 11-1 load/store queue 2-26, 2-27 
integer 6-1 load/store tap 2-26, D-1, D-3, D-4, D-10, D-11, D- 
integer arithmetic 6-1 12 
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loadcpu instruction 2-12, 5-1, 5-2,54, D-12 
long constant instruction format 2-25, 3-3, 3-4, 3- 
5, 3-6, 12-3, C-8, C-9 


machine check 2-24, 2-27, D-1, D-2 
external D-2 
functional unit D-2 
instruction cache parity error D-2 
internal pipeline D-2 
trap-class D-2 
map(addr) 3-1, 5-2 
mem(size,paddr) 3-1, 5-2 
memory 2-7 
virtual 2-9 
memory precision 2-9 
memory transfer sub-line, see sub-line 
memory-related parity error 5-3, 54 
miss 
datacache 9-8, 9-9, C-3, C-7, C-11, C-12 
datamap 2-13, 5-2 
instruction cache C-10 
instruction stache C-7,C-8 
mode 
addressing 2-8 
reference 2-9 
supervisor 2-9, 2-22 
user 2-9, 2-22, 2-23 
move instruction C-2 
move.d instruction 6-1 
mult.d instruction 1-6 
mult.s instruction I-6 


NaN 
quiet 2-4 
signaling 2-4 
usage I-1 
values defined 2-5 
neg.d instruction I-1, I-5 
neg.s instruction I-1, I-5 
nonexistent memory 2-19, 5-2, 5-3, 5-4 
nop instruction 2-25, C-8, C-9 


opcode 3-1 


page 2-10 
invalid 2-13 
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shared 2-12, 2-13, 5-1, 5-44, 5-47, 5-48, 9-8 
page control bit 2-12, 2-13 
page map, see page table 
page map/cache tag parity error 5-3, 5-4 
page table 2-10 
data 2-10, 2-12, 2-13, 2-15, 2-22, 2-29, 5-1, 5- 
2, 5-3, 5-4, 9-3, 9-8 
instruction 2-10, 2-11, 2-15, 2-29, 9-2, D-5 
PC 2-16, 2-17, 7-1, H-2 
Restart PC 2-25, 2-26, 2-28, 10-8, D-15 
retum PC 7-1 
PC-relative branch instruction 7-1 
PC-relative branch instruction format 3-1, 3-4, 3- 
5, 3-6, 7-1 
pel instruction 2-12, 2-13, 5-2, 5-53, C-11, C-12, 
D-12 
physical addresses 2-10 
pipelining C-2 
precision 2-8, 2-9 
privilege violation 2-18, 2-25, 7-1, 13-1 
Process Key 2-12, 2-13, 2-20, 2-21, 5-2, 9-3, D-15 
Processor Priority Level 2-20, 2-23, 2-27, 8-4 
Processor Status 2-8, 2-9, 2-11, 2-12, 2-19, 2-21, 2- 
29, 4-1, 4-5, 5-1, 5-2, 8-1, 8-2, 8-3, H-4 
Arithmetic Exception Flags 2-20, 2-21, 4-1, C- 
9 
Arithmetic Trap Enables 2-20, 4-1 
Byte Order Low-to-High 2-3, 2-8, 2-20, 2-21, 
2-22, 5-1, 5-2, D-15 
Early Load Alignment Trap 2-19, 2-20, 2-23, 
5-3, 5-4, D-15 
flags 2-7, 2-19, 2-20, 4-1 
Integer Divide Trap 2-21 
Integer Divide Trap Enable 6-14, 6-15 
Process Key 2-12, 2-13, 2-20, 2-21, 5-2, 9-3, 
D-15 
Processor Priority Level 2-20, 2-23, 2-27, 8-4 
Processor Version Number 2-19, 2-20, 8-3 
Rounding Mode 2-20, 2-21, 4-5, 6-15, 8-5 ~ 
Small Address Compatibility Mode 2-9, 2-20, 
2-22, 2-23, 5-1, 5-2, 7-1, D-15 
Trace Enable 2-20, 2-21 
Trace Pending 2-20, 2-21, 2-22 
User Mode Load 2-11, 2-12, 2-20, 2-22, 2-23, 
5-1, 5-44, 9-10, D-15 
User Mode Store 2-11, 2-12, 2-20, 2-22, 2-23, 
5-1, 5-44, 9-7, 9-8, 9-9, D-15 
User Protection 2-11, 2-12, 2-13, 2-20, 2-22, 
2-23, 5-1, 5-2, 9-7, 9-9, 9-10, D-15 
Processor Version Number 2-19, 2-20, 8-3 
program addresses 2-16, H-2 
program counter, see PC 
program execution 2-16 
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read port C-5, C-6 
reference 2-9 
mode 2-9 
shared 2-12, 2-15 
supervisor mode 2-9, 2-10, 2-11, 2-12, 2-15, 2- 
21, 2-22, 2-23, 2-26, 5-1, 5-4, 5-47, 5- 
48, 5-53, 9-8 
user mode 2-9, 2-10, 2-11, 2-12, 2-13, 2-15, 2- 
19, 2-22, 2-23, 5-1, 5-2, 5-3, 5-4, 5-53 
register branch instruction format 3-4, 3-6, 7-1, 7- 
3, 7-5, 9-4 
register instruction format 3-2, 3-3, C-8, C-9 
registers 2-1, 2-6, 2-9 
general 2-7 
reserved C-3 
relf instruction 2-7, 2-19, D-12 
reserved 
register C-3 
reset operation 2-29 
reset trap 2-26, 2-27, 2-29 
Restart PC 2-25, 2-26, 2-28, 10-8, D-1, D-15 
return PC 7-1 
rfec instruction C-12, D-12 
rios instruction 2-23, C-12, D-13 
rounding 2-21, I-2, I-4, I-6, I-7, I-9 
Rounding Mode 2-20, 2-21, 4-5, 6-15, 8-5 
rps instruction 2-19, 2-26, C-4, C-9, D-15 
rtrpd instruction 2-26, 2-27, C-4, D-1, D-10, D- 
12, D-14 
rut instruction 2-24, C-4 


sel instruction C-4 
semaphore 2-16 
serial instructions C-5, C-9 
shared 
data 2-12, 2-14, 2-15, 5-1, 5-44, 5-47, 5-48, 9- 
8 
page 2-12, 2-13 
reference 2-12, 2-15 
shared dflush 2-16, 9-8 
short constant instruction format 3-2, 3-3, C-8, C- 
9 
sign extended 2-3 
slstrpd instruction 2-26, 2-27, 5-3, 5-53, D-1, D- 
12, D-14 
Small Address Compatibility Mode 2-9, 2-20, 2- 
22, 2-23, 5-1, 5-2, 7-1, D-15 
spl instruction 2-19, 2-23, C-4, C-9 
sqrt.d instruction C-6, I-8 
sqrt.s instruction C-6, I-8 
srca 3-1, 3-2 


Amdahl Key Computer Labs 


CONFIDENTIAL 


Index 


srcb 3-1, 3-2 

srcc 3-1 

srm instruction 2-19, 2-21, C-4, C-5, C-9 

stache 

instruction 2-15, C-1, C-7, C-8, C-10 

store instruction 2-12, 2-13, 2-27, 5-1, 5-2, 5-4, 5- 
53, C-6, C-11, D-12, D-13, D-14, D-15 

storecpu instruction 2-12, 2-27, 5-1, 5-2, D-12 

strap instruction 2-28, C-9, D-3, D-4 

sub-line 2-14, 5-3, 5-4, E-1, E-2, H-3 

sub.d instruction I-5 

sub.s instruction I-5 

subb instruction C4 

subt instruction D-7 

supervisor mode 2-9, 2-22 

swat instruction 2-12, 2-13, 2-16, 5-2, 5-53, D-12, 
D-13, D-14, D-15 

synchronization 2-16 


target 
branch 2-18, 9-4 
timers 2-23 
timing C-1 
Trace Enable 2-20, 2-21 
Trace Pending 2-20, 2-21, 2-22 
trace trap 2-25, 2-28 
trap 2-24, 2-26 
arithmetic 2-20, 2-21 
bpt instruction 2-25, 2-28, 2-29, D-3 
check 6-61 
data map miss 2-13, 2-24, 5-2, 5-3, 9-9 
data watchpoint 5-3, 5-4, 5-53 
decode 2-25, 2-26, 2-27, 2-28, D-1, D-3, D-4 
dflush-type 5-3 
ECC 5-2, 5-3, 5-4 
echk 2-19, 5-49, 5-50 
eload-type 5-3 
floating-point 2-26, 4-5, 4-13, D-1, D-3, D-4, 
D-7 
handling D-1 
illegal access 2-13, 5-2, 5-3, 5-4, 9-9 
illegal address 2-8, 2-19, 2-23, 5-1, 5-2, 5-3, 5- 
4 


illegal instruction/privilege violation 2-18, 2- 
25, 3-6, 7-1, 13-1 

imprecise 2-25, 2-26, 2-27 

in Trap State 2-26 

instruction fetch 2-26, 2-27, 2-28, D-1 

instruction map miss 2-17, 2-25, 2-28, 9-4, H-1 

integer 2-26, 4-5, D-1, D-7 

invalid operation 2-4, 4-1 

load-type 5-3 
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load/store 2-26, D-1, D-3, D-4, D-10, D-11, D- 
12 
memory-related parity error 5-3, 5-4 
NaN causing 2-4 
nonexistent memory 2-19, 5-2, 5-3, 5-4 
page map/cache tag parity error 5-3, 5-4 
precise 2-25, 2-27 
reset 2-26, 2-27, 2-29 
return from 2-28 
simultaneous D-1 
store-type 5-3 
strap instruction 2-28, D-3, D-4 
trace 2-21, 2-22, 2-25, 2-28 
trap instruction 2-25, 2-28, D-3, D-4 
vectoring 2-27, 2-29 
xtrap instruction 2-28, D-3, D-4 
zcl-type 5-3 
trap data 
load/store D-1 
primary D-1 
trap instruction 2-25, 2-28, C-9, D-3, D-4 
Trap Locator 2-26, 2-27, D-1, D-5 
trap sequence 2-26 
Trap State 2-9, 2-11, 2-15, 2-19, 2-20, 2-21, 2-22, 
2-23, 2-24, 2-26, 2-27, 2-28, 2-29, 5-53, 7- 
2, 7-3, 10-3, 10-8, C-9, C-10, D-8, D-9, D- 
10, D-14, D-15, H-1 
Trap Summary 2-23, 2-25, 2-26, 2-27, D-1 
trap type D-1 


unconditional short constant instruction format 3- 
2, 3-3 

undefined opcodes 13-1 

unordered 4-1 

uptime counter 2-23 

user mode 2-9, 2-22, 2-23 

User Mode Load 2-11, 2-12, 2-20, 2-22, 2-23, 5-1, 
5-44, 9-10, D-15 

User Mode Store 2-11, 2-12, 2-20, 2-22, 2-23, 5-1, 
5-44, 9-7, 9-8, 9-9, D-15 

User Protection 2-11, 2-12, 2-13, 2-20, 2-22, 2-23, 
5-1, 5-2, 9-7, 9-9, 9-10, D-15 


virtual addresses 2-9, 2-10 
virtual memory 2-9 


wdwp instruction 5-4, D-13 
welf instruction 2-7, 2-19, D-13 
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wfec instruction C-12, D-12 

wios instruction 2-23, C-12, D-13 

wit instruction 2-24, C-4, C-9 

wps instruction 2-19, 2-20, 2-21, 4-1, C-4, C-5, C- 
7, C-9, D-12, D-15 

wrapping C-4 


xtrap instruction 2-28, C-9, D-3, D-4 


zcl instruction 2-12, 2-13, 2-29, 5-3, 5-53, C-12, 
D-12 
zero extended 2-3 
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